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TATA-BINDING PROTEIN ASSOCIATED FACTORS, NUCLEIC ACIDS 
ENCOniNC. TAFS, AND METHODS OF USE 

The research carried out in ihe subject application was supported in part by 
grants from the National Institutes of Health. The government may have rights in 
any patent issuing on this application. 

5 rRQSS-REFERF.Nrn TO RFf ATHD AP PLICATION 

This Applicaiion is a continuation-in-part of Application Serial No. 
08/087,119 filed June 30, 1993, which is a continuation-in-part of Application 
Serial No. 08/013,412 filed January 28, 1993. 

10 [NTRODUCTION 

Technical Fi^id 

The technical Held ol tiiis invention concerns TATA-binding protein 
associated factors, proteins involvctl in gene transcription. 

IS gqgKRfOltnd 

Gene transcription requires the concerted action of a number of molecules. 
DNA provides regulatory sequences and a coding sequence, or template, from 
which an RNA polymerase synthesizes corresponding RNA. Regulatory sequences 
generally include sites for sequence-specific transcriptional control, including 
20 promoters, enhancers, suppressors, etc; and also a site for transcription initiation. 
For review, see Mitchell niul rjian ( 1989), Science 245. 371-378. 
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RNA polymerases alone appear incapable of initiating transcription. 
However, in viiro iranscripiional activiiy ot RNA polymerases can be restored by 
the addition of niiciciir cxiracis or Tractions thereof. For example, under certain 
conditions, in vitro iranscripiion by RNA polymerase 11 (Pol II) can be at least 
5 partially restored by the acidiiion of wiiat have variously been reported to be four, 
five, six or seven nuclear tractions (See e.g. Matsui el al. (1980), Biol Chem 255, 
1192], herein referred to as TFllA. TFIIB, TFIID, TFIIE.TFIIF, TFIIH and 
TFIU. Pol I and Pol III appear to require at least two fractions, called respectively 
SLl and UBF. and TFfllA and TFIIIB. 
10 Many of these transcription fractions remain only partially characterized. 

For example, all but one of the Pol II fractions remain incompletely characterized 
or comprise multiple components. The fractions TFIID, SLl and TFIIB have been 
reported to contain a TATA binding component, henceforth, TATA-binding 
protein, or TBP. Groups of the present Applicants have reported anti-TBP 
15 antibodies capable of imnnuioprecipiiating TBP from TFIID, SLl, and TFIIIB, 
TFIID, SLl and TFIIIB immunoprecipiiates have revealed TBP and 
numerous associated factors, tentatively called TBP-associated factors, or TAFs. 
Furthermore, preliminary experiments indicated that the TBP and non-TBP (TAF) 
fractions, when combined, facilitated at least some sequence-specific transcription 
20 activation. 

Unfortunately, it is not clear from the above art that there is any 
transcriptional activity in ihc non-TBP fractions of TFIID, SLl or TFIIB 
immunoprecipitates. For example, the reported apparent functional 
complementarity of the TBP and non-TBP fractions might result from the influence 

25 of antirepressors, inhibitor inhibition, etc. Furthermore, the coactivator 

transcriptional activity aiiribtiied to the non-TBP fractions could result from one or 
more components imrclaicd lo the clecirophoretically resolved TAF components. 
Nor does the literature provide any suggestion as to which, if any, of the 
elecu-ophoretically resolved components of the non-TBP fraction provide(s) 

30 transcriptional activity, nor means for identifying bands resolvable from the non- 
TBP fractions. 



2 
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Relevant Literature 

Pugh and Tjian ( 1990), Ceil 61:1 187-1 197; Tanese et al. (1991), Genes and 
Devel 5:2212-2224; Puch am! Tiian'(199n, Genes and Devel 5:1935-1945: 
Dynlacht et al. (1991). Cell 66:563-576; Timmers ei al. {\99\), Genes and Devel 
5 5:1946-1956; Zhou ei al. (1992), Genes and Devel 6: 1964-1974; and Takada et al. 
(1992), Proc Nad Acad Set USA 89: 1 1809-1 1813, relate to factors associated with 
Pol II transcription. Comal et al. (1992) Cell 68:965-976 relates to factors 
associated with Pol I transcription. Lobo et al. (1991), Genes and DeveL 5:1477- 
1489; Margotin et al. (1991). Science 251:424-426: Simmen ei al. (1991), EMBO J 

10 10:1853-1862; and Tnggart ci al. (1992), Cell 71:1015; Lobo et al. (1992), Cell 
71:1029; and White and Jackson (1992), Cell 71:1041 relate to factors associated 
with Pol III transcription. Sekiguchi et al. (1988), EMBO J 7:1683-1687 and 
Sekiguchi et al. (1991), M<d and Celltdar Biol 11:3317-3325 disclose the cloning 
of the CCGl gene encoding a protein reported to be involved in cell cycle 

15 progression. 

SUMMARY OF THE INVENTION 
Substantially pure and biologically active TATA-binding protein associated 
factors (TAFs), eiikaryotic nuclear proteins involved in RNA polymerase I, II, and 

20 III transcription, nucleic acids encoding TAFs, and methods of using TAFs and 
TAF-encoding nucleic acids arc provided. Recombinant TAFs, anti-TAF 
antibodies and TAF-fusion products fmd use in drug screening, diagnositcs and 
therapeutics. In particular, the disclosed TAFs provide valuable reagents in 
developing specific biochemical as.says for screening compounds that agonize or 

25 antagonize selected transcription factors involved in regulating gene expression 
associated with human pathology. 

DF^SCRIPTION OF SPE ClFir EMBODIMENTS 
Substantially pure and biologically active TATA-binding protein associated 
30 factors (TAFs) and portions thereof, nucleic acids encoding TAFs and portions 
thereof, and methods of use arc provided. 

As used herein, a given TAF refers to the TAF protein, recombinant or 
purified from a natural source, and functional and xenogeneic analogs thereof. For 
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example "dTAFIIi 10" rcicrs to a Po! II TAF. deriveable from Drosophila, with an 
apparent molecular wciuhi of about 1 iO kD. generally as determined by SDS- 
PAGE under conditions dcscrilicd herein, in Dynlacht et al. (1991), Comai et al. 
(1992), or otherwise ideniificd by functional, sequence, etc. data herein. It is 
5 understood that these molecular weight designations are for the convenience of 
nomenclature and may not necessarily correspond to actual or predicted molecular 
weight. Other TAFs are analogously identified herein. 

A "portion" of a given TAF is a peptide comprising at least about a six, 
preferably at least about an eighteen, more preferably at least about a thirty-six 
10 amino acid sequence of the TAF. Of particular interest are portions of the TAF 
that facilitate functional or structural interaction with activators, TAFs, TBP, Pol I, 
II or III, the TATA box and surroimding DNA sequences, etc. Methods for 
identifying such preferred portions are described below. 

By substantially full-length is meant a polypeptide or polynucleotide that 
15 comprises at least 50%, preferably at least 70% and more preferably at least 90% 
of the natural TAF polypeptide or polynucleotide length. 

"Xenogeneic" TAF analogs are nonhuman-, nonDrosophila-derived proteins 
with substantial functional or .sequence identity to human and Drosophila TAFs. 
Of particular interest are xenogeneic TAF analogs derived from rodents, primates, 
20 and livestock animals including bovine, ovine,, equine and avian species 

"Functional" analogs of a given TAF or proteins with "substantial 
functional identity" to a given TAF are compounds that exhibit one or more 
biochemical properties specific to such TAF, such as the ability of dTAFIIi 10 to 
interact with Spl. 

25 "Modulating transcription" means altering transcription, and includes 

changing the rate of transcription initiation, the level of transcription, or the 
responsiveness of transcription/transcription initiation to regulatory controls. 

The terms "substantially pure" or "isolated" mean that the TAF, TAF 
portion, or nucleic acid encoding a TAF or TAF portion is unaccompanied by at 

30 least some of the material with which it is normally associated in its natural state. 
While a composition of a substantially pure TAF or portion thereof is preferably 
substantially free of polyacr>'lamidc. such composition may contain excipients and 
additives useful in diagnostic, therapeutic and investigative reagents. A 
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substantially pure TAF coiiiposiiion subject lo electrophoresis or reverse phase 
HPLC provides such TAP as a single discernable proteinaceous band or peak. 

Generally, a substantial ly pure TAF composition is at least about 1 % 
protein weight said TAF: preferably at least about 10%; more preferably at least 
5 about 50%; and most prcicrahly at least 90%. Protein weight percentages are 
determined by dividing the weight of the TAP or TAF portion, including 
alternative forms and analogs of the TAF such as proteolytic breakdown products, 
alternatively spliced, differentially phosphorylated or glycosylated, or otherwise 
post-translationally modified forms of the TAF. present in a fraction by the total 

10 protein weight present. 

A biologically active TAF or TAF portion retains one or more of the 
TAP'S native function such as the ability to specifically bind TBP. transcription 
factors (activators), other TAFs or anii-TAF antibodies, or to modulate or faciUtate 
transcription or transcription initiation. Exemplary assays for biological activity 

15 are described below and in the working exemplification. 

Specific binding is empirically determined by contacting, for example a 
TAF, with a mixture of components and identifying those components that 
preferentially bind the TAF. Specific binding may be conveniently shown by 
competitive binding studies, for example, immobilizing a TAP, on a solid matrix 

20 such as a polymer bead or niicrotiicr plate and contacting the immobilized TAF 
with a mixture. Often, one or more components of the mixture will be labelled. 
Another useful approach is to displace Libelled ligand. Generally, specific binding 
of a TAP will have binding afllniiy of lO^M. pretierably lO'M. more preferably 
lO^'^M under optimi7xd reaction conditions and temperature. 

25 Portions of TAFs find use in screening TAF expression libraries, defining 

functional domains of TAFs, identifying compounds that bind or associate with 
TAFs and the like. Accordingly, peptides encoding a portion of a TAP are 
provided that are capable of modulating transcription including transcription 
initiation. Typically, such peptides are effective by binding to a TAP, an 

30 activator, or TBP or conipeiiiivcly inhibiting a TAP domain's association with 
another compound, typically a protein like TBP or another TAP. an activator, or 
DNA. Por example. ■rAI--'rAI- interactions may be exploited to purify TAPs, e.g. 
immobilized TAP20() is iisu-il to purify TAP! 10. 
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Associational domains of TAFs are ascertainable by those skilled in the an 
using the methods and coniposiiions disclosed herein. Useful methods include in 
vitro mutagenesis such as (loiotion 'inurants, secondary and tertiary structural 
predictions, antibody and solvent accessibility, etc. For example, peptides derived 
5 from highly charged regions tlnd particular use as immunogens and as modulators 
of TAF-proiein interactions. Also. TAF mutants are used to identify regions 
important for specific protein interactions or otherwise involved in transcription. 
Here, useful assays include column binding assay and transfection studies. 

The invention provides recombinanily produced TAFs, TAF analogs and 

10 portions thereof. These recombinant products are readily modified through 

physical, chemical, and molecular techniques disclosed or cited herein or otherwise 
known to those skilled in the relevant art. According to a particular embodiment 
of the invention, portions of ihc TAF-encoding sequences are spliced with 
heterologous sequences to produce fusion proteins. Such fusion proteins find 

15 particular use in modulating gene transcription in vitro and in vivo. 

For example, many cukaryoiic sequence-specific transcription factors have 
separable DNA binding and activation domains. A TAF or domain thereof can be 
fused to a well-characterized DNA binding domain (see, e.g., Sadowski et al., 
(1988) Nature 335, 563-564) and the resulting fusion protein can be tested for its 

20 ability to modulate tmn.scription or transcriptional initiation. For example, we 
disclose the fusion of the N-terminal region of TAFUO to the DNA binding 
domain of the GAi-4 protein. Alternatively, an TAF domain can be fused with a 
domain having endonuclcasc activity for site-specific DNA cleaving. Other useful 
TAF fusion partners include GST. Lerner epitope, an epitope recognized by a 

25 monoclonal antibody (e.g. hemagglutinin epitope and 12CA5 monoclonal 

antibody), glutathione S-transfcra.se for immobilization, the SPl or VP16 activation 
domains, etc. 

TAFs can be further modified by methods known in the art. For example, 
TAFs may be phosphorylaied or dephosphorylaied, glycosylated or deglycosylated. 
30 with or without radioactive labeling, etc. The disclosed TAF serine residues in 
paniciilar provide useful phosphorylation sites. See e.g. methods disclosed in 
Roberts et al. (1991) Science 253. 1022-1026 and in Wegner et al. (1992) Science 
256, 370-373. Especially useful arc modifications that alter TAF solubility. 
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membrane transportability, siabiliiy. and binding specificity and affinity. Some 
examples include faiiy acid-acylation. proteolysis, and mutations in TAF-TAF or 
TAF-TBP interaction domains that stabilize binding. 

TAFs may also be modified with a label capable of providing a detectable 
5 signal, for example, at a heart muscle kinase labeling site, either directly or 

indirectly. Exemplary labels include radioisotopes, fluorescers, etc. Alternatively, 
a TAF may be expressed in the presence of a labelled amino acid such as ^^S- 
methionine. Such labeled TAFs and analogs thereof find use, for example, as 
probes in expression screening assays for proteins that interact with TAFs, or, for 
10 example, TAF binding to other transcription factors in drug screening assays. 

Specific polyclonal or monoclonal antibodies that can distinguish TAFs 
from other nuclear proteins are conveniently made using the methods and 
compositions disclosed in Harlow and Lane, Antibodies. A Laboratory Manual, 
Cold Spring Harbor Laboratory, 1988, other references cited herein, as well as 
15 immunological and hybridoma technologies known to those in the arL In 

particular, TAFs and analogs and portions thereof also find use in raising anti-TAF 
antibodies in laboratory animals such as mice and rabbits as well as the production 
of monoclonal antibodies by cell fusion or transformation. 

Anti-TAF antibodies and fragments (Fab, etc) thereof find use in 
20 modulating TAF involvement in transcription complexes, screening TAF 

expression libraries, etc. In addition, these antibodies can be used to identify, 
isolate, and purify structural analogs of TAFs. Anti-TAF antibodies also find use 
for subcellular locali/aiinn of TAFs under various conditions such as infection, 
during various cell cycle phases, induction with cytokines, protein kinases such as 
25 C and A, etc. Other exemplary applications include using TAF-specific antibodies 
(including monoclonal or TAF-derived peptide specific antibodies) to immuno- 
deplete in vitro transcription extracts and using immuno-affinity chromatography to 
purify TAFs, including analogs, or other nuclear factors which interact with TAFs, 
Compositions are also provided for therapeutic intervention in disease, for 
30 example, by modifying TAFs or TAF encoding nucleic acids. Oligopeptides can 
be synthesized in pure form and can find many uses in diagnosis and therapy. 
These oligopeptides can be used, for example, to modulate native TAF interaction 
with other TAFs, TRP. olhcr transcription factors or DNA. The oligopeptides will 
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generally be more than six and fewer than about 60 amino acids, more usually 
fewer than about 30 amino acids, nlihoiigh large oligopeptides may be employed. 
A TAF or a portion iliercof may be. used in purilled form, generally greater than 
about 50%, usually greater than about 90% pure. Methods for purifying such 
5 peptides to such purities include various forms of chromatographic, chemical, and 
electrophoretic separations di.sclosed herein or otherwise known lo those skilled in 
the art. 

Experimental meilunis for purifying TAFs are set out briefly below and in 
detail in the following working exemplification. Generally, TBP-TAF complexes 

10 are immunopurincd (generally, by immunoprecipitation) using polyclonal or 
monoclonal antibodies directed against a native TAF or TBP epitope. 
Alternatively, monoclonal antibodies directed against an epiiope-tagged TBP or 
TAF may be used. See e,g. Zhou, et al. (1992). At least three complementary 
experimental approaciies are employed for isolating cDNAs encoding TAFs: (1) 

15 TAF-specific binding proteins (e.g. antibodies directed against TAF proteins, 
TAF-binding TAFs. TRP. TAF-binding activators, or TAF-binding coactivators) 
are used for screening expression libraries; (2) cDNA libraries are screened with 
potentially homologous TAF oligonucleotide sequences (alternatively, a series of 
degenerate oligonucleotide PGR primers derived from the homologous TAF 

20 sequence may be u.sed to amplify probes from cDNA. See Peterson et al. (1990) 
Science, 248, 1625-1(>.M). iMgurcL); and, (3) TAF proteins are purified to 
homogeneity for protein microsequcncing. 

TAF FNrODfN n NUCLEIC ACID 

25 The invention provides nucleic acid sequences encoding TAFs and portions 

of TAFs. By "encoding a portion of a TAF" is meant to include sequences 
substantially idcniical to seqtiences LMicoding at least a portion of a TAF. Included 
are DNA and RNA sequences, sense and antisense. 

"Substantial sequence identity" means that a pordon of the protein or 

30 nucleic acid presents at least about 70%. more preferably at least about 80%, and 
most preferably'ar least abinit 90% sequence identity with a TAF sequence portion. 
Where the sequence diverges from native TAF sequences disclosed herein, the 
differences are preferably conserx-aiive. i.e. an acidic for an acidic amino acid 
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substitution or a miclcoiitlc cliaiiiie providing a redundant codon. Dissimilar 
sequences are typically aiigrcuaicd within regions rather than being distributed 
evenly over the polymer. 

A substaniially identical sequence hybridizes to a complementary. TAF- 
5 encoding sequence under low stringency conditions, for example, at SCC and 6X 
SSC (0.9M saline/0.09M sotlium citrate) and that remains bound when subject to 
washing at SS^'C with IX SSC. 

The invention's TAF encoding polynucleotides are isolated; meaning that 
the claimed sequence is present as other than a naturally occurring chromosome or 
10 transcript in its natural environment. Typically isolated sequences are removed 
from at least some of the nucleotide sequences with which they are normally 
associated with on a natural chromosome. 

A substantially pure or isolated TAF- or TAF portion-encoding nucleic acid 
is generally at least about 1 % nucleic acid weight said TAF-encoding nucleic acid; 
15 preferably at least about 10%: more preferably at least about 50%; and most 
preferably at least 90%. Nucleic acid weight percentages are determined by 
dividing the weight of the TAF or TAF portion-encoding nucleic acid, including 
alternative forms and analogs such as alternatively spliced or partially transcribed 
forms, by the total nucleic acid weight present. 
20 The invention also provides for TAF sequences modified by transitions, 

transversions, deletions, insertions, or other modifications such as alternative 
splicing and such alternniivc forms, genomic TAF sequences, TAF gene flanking 
sequences, including TAP rciiulaiory sequences and other non-transcribed TAF 
sequences, TAF mRNA sequences, and RNA and DNA antisense sequences 
25 complementary to TAF encoding sequences, equences encoding xenogeneic TAFs. 
and TAF sequences comprising synthetic nucleotides, e.g., the oxygen of the 
phosphate group may be replaced with sulfur, methyl, or the like. 

For modified TAF-encoding sequences or related sequences encoding 
proteins with TAF-like luiiciions. there will generally be substantial sequence 
30 identity between at least a portion thereof and a portion of a TAF, preferably at 
least about 40%, inore preferably at least 80%, most preferably at least 90%, 
particularly conservative suhsiiiutions, parliculariy within regulatory regions and 
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regions encoding proicin domains involved in proiein-proiein interactions, 
paniculariy TAF-transcription factor inieraciions. 

Typically, the invLMiiion's TAP encoding polynucleotides are associated 
with heterologous sequences, lix^unples of such heterologous sequences include 
5 regulatory sequences such as prGinoiers. enhancers, response elements, signal 
sequences, polyadcnylaiion sequences, cfc, ihtrons, 5' and 3' noncoding regions, 
etc. Other useful heterologous sequences are known to those skilled in the art or 
otherwise disclosed references cited herein. See for example, Russel Doolittle. Of 
URFs and ORFs, A Primer on How lo Analyze Derived Amino Acid Sequences, 

10 University Science Books, Mill Valley CA. 

TAF encoding nucleic acids can be subject to alternative purification, 
synthesis, modification or use by methods disclosed herein or otherwise known in 
the art. For example, the nucleic acids can be modified to alter stability, 
solubility, binding affinity and specificity, melhylation, etc. The nucleic acid 

15 sequences of the present invention may also be modified with a label capable of 
providing a detectable signal, either directly or indirectly. Exemplary labels 
include radioisotopes, fiuorescers, biotinylation, etc. 

Nucleic acids encoding at least a portion of a TAF are used to identify 
nuclear factors which interact with that TAF using expression screening in yeast as 

20 described in Current Protocols in Molecular Biology. In this example, a yeast 
cDNA library containing fiision genes of cDNA joined with DNA encoding the 
activation domain of a innscripiion factor (e.g. Gal4) are transfected with fusion 
genes encoding a portion of a TAF and the DNA binding domain of a transcription 
factor. Clones encoding TAF binding proteins provide for the complementation of 

25 the transcription factor and are identified through transcription of a reporter gene. 
See, e.g. Fields and Song (1989) Nature 340. 245-246 and Chien et al. (1991) 
Proc Natl Acad Sci USA 88, 9578-9582. 

The invention also provides vectors comprising nucleic acids encoding a 
TAF or portion or analog thereof. A large number of vectors, including plasmid 

30 and viral vectors, have been described for expression in a variety of eukaryotic and 
prokaryotic hosts. Vectors will ofieii include one or more replication systems for 
cloning or expres.sion. one or more markers for selection in the host. e.g. antibiotic 
resistance, and one or more expression cassettes. The inserted TAF coding 
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sequences may be synihesized. isolated from natural sources, prepared as hybrids, 
etc. Ligation of the coding sequences to the transcriptional regulatory sequences 
may be achieved by known methods. Advantageously, vectors may also include a 
promoior operably linked to tlic TAF encoding portion. 
5 Suitable host cells may he iransrormed/iransfecied/infected by any suitable 

method including elcctroporaiion, CaCU mediated DNA uptake, viral infection, 
microinjection, microproieciiie, or other established methods. Alternatively, 
nucleic acids encoding one or more TAFs may be introduced into cells by 
recombination events, i-or example, a sequence can be microinjected into a cell, 

10 and thereby effect homologous recombination at the site of an endogenous gene 
encoding a TAF, an analog or pscudogenc thereof, or a sequence with substantial 
identity to a TAF-encoding gene. Other recombination-based methods such as 
nonhomologous recombinations, deletion of endogenous gene by homologous 
recombination, especially in pluripoient cells, etc., provide additional applications. 

15 Appropriate ho.si cells include bacteria, archebacteria, fungi, especially 

yeast, and plant and animal cells, especially mammalian cells. Of particular 
interest are E. coli . R. suhtilis , .Snccharomvc es cerevisiae. SF9 and SF21 cells, 
C129 cells, 293 cells, Neurospom, and CHO, COS, HeLa cells and immortalized 
mammalian myeloid and lymphoid cell lines. Preferred replication systems include 

20 M13, ColEl, SV40, baculovirus, vaccinia, lambda, adenovirus, AAV, BP V, etc. 
A large number of transcription initiation and termination regulatory 
elements/regions have been isolated and shown to be effective in the transcription 
and translation of heterologous proteins in the various hosts. Examples of these 
regions, methods of isolation, manner of manipulation, etc. are known in the art. 

25 The particular choice of vector/host cell is not critical to the invention. 

Under appropriate expression conditions, host cells are used as a source of 
recombinantly produced TAFs or TAF analogs. Preferred expression systems 
include E. Coli, vaccinia, or haculovirus; the latter two permitting the recombinant 
TAFs to be modified, processed and transponed within a eukaryotic system, 

30 TAF-encoding oligonucleotides also used to identify other TAFs or 

transcription factors, l-or example. *-P-labeled TAF-encoding nucleic acids are 
used to screen cDNA libraries at low stringency to identify similar cDNAs that 
encode proteins with TAI'-reiaicd domains. Additionally, TAF related proteins are 
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isolated by PCR ampiincation with deseneraie oligonucleoiide probes using the 
sequences disclosed herein. Oilier experimeniai methods for cloning TAFs, 
sequencing DNA cneiulini: TAW. aiKl expressing recombinant TAFs are also set 
out in the working excmplitlcaiion below. Other useful cloning, expression, and 
5 genetic manipulation techniques tor practicing the inventions disclosed herein are 
known to those skilled in the an. 

The compositions and methods disclosed herein may be used to effect gene 
therapy. See, e.g. Gutierrez ei al. (1992) Lancet 339, 715-721. For example, 
cells are transfected with TAP sequences operably linked to gene regulatory 
10 sequences capable of effecting altered TAP expression or regulation. To modulate 
TAP translation, cells may be transfected with TAF complementary antisense 
polynucleotides. 

Antisense modulation may employ TAF antisense sequences operably linked 
to gene regulatory sequences. Cells are transfected with a vector comprising a 

15 TAF sequence will) a promoter sequence oriented such that transcription of the 
gene yields an antisense transcript capable of binding to TAF encoding mRNA, 
Transcription may be constitutive or inducible and the vector may provide for 
stable extrachromosoma! maintenance or integration. Alternatively, single-stranded 
andsense nucleic acid sequences that bind to genomic DNA or mRNA encoding at 

20 least a portion of TAF may be administered to the target cell at a concentration 
that results in a substantial reduction in TAF expression. 

ASSAYS FOR inPNTIPYINn TRANSCRIPTION FACTORS AND 
THF:pAPFimC AGENTS 

25 The invention pmvicles methods and compositions for identifying agents 

useful in modulating gene tninscripiion. Such agents tlnd use in the diagnosis or 
treatment of broad range of disease including cancer, cardiovascular diseases, 
microbial and fungal infections and particularly viral infections, inflammatory 
disease, immune disease, etc. The ability to develop rapid and convenient high- 

30 throughput biochemical assiiys inr screening compounds that interfere with the 
process of transcription in human cells opens a new avenue for drug development. 
An overview of this thempeuiiic approach is presented in Peterson & Baichwal 
(1993), Trends in Biotechnology, in press. 
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Typically, pmspcciive aiicnis are screened from large libraries of synthetic 
or natural compounds. ri>r example, numerous means are available for random 
and directed synthesis of saccharide*, peptide, and nucleic acid based compounds, 
see, e.g. Lam ei aL, (1991) Nature 354, 82-86. Alternatively, libraries of natural 
5 compounds in the form o\ bacterial, fungal, plant and animal extracts are available 
or readily predicablc. Additionally, namral and synthetically produced libraries 
and compounds arc readily niodiricd through conventional chemical, physical, and 
biochemical means, llxamplos of such modifications are disclosed herein. 

Useful agents are ideniified with a range of assays employing TAFs or TAF 

10 encoding nucleic acids. As examples, protein binding assays, nucleic acid binding 
assays and gel shift assays arc useful approaches. Exemplary assays include 
assaying labeled TBP binding to immobilized TAF, labeled TAF or TAF peptide 
* binding immobilized TBP, etc. Many appropriate assays are amenable to scaled- 
up, high throughput usage suitable for volume drug screening. Such screening will 

15 typically require the screening of at least about 10, preferably at least about 100, 
and more preferably at least about 1000 prospective agents per week. The 
particular assay used will be determined by the particular nature of the TAF 
interactions. For instance, a prospective agent may modify with the function of a 
TAF but not with tran.scripiion complex assembly. For example, a molecule that 

20 binds to a TAF but does not disrupt complex assembly is identified more readily 
through labelled binding assays than through gel retardation assay. Assays may 
employ single TAFS, TAT portions. TAF fusion products, partial TAF complexes, 
or the complete TFIID transcription complex, depending on the associational 
requirements of the subject transcription factor. 

25 Useful agents arc typically those that bind to or modify the association of 

transcription associated factors, especially TAFs. Preferred agents include those 
capable of modulating ihe expression of Pol H genes, particulary oncogenes 
(including viral oiicogcnes such as adenovirus EIA, human papilloma E7, and 
cellular oncogenes such as Rh. P33. IZ2F, myc, fos/jun (API), abl, etc.), genes 

30 transcribed during viral infection or activation, and sterol regulated genes, 

Preferered agents modify, preferably dismpl, TAF-TAF, TAF-activator, TAF- 
coactivator (coactivators include OCA-B. dTAFIIl 10, etc.) or TAF-TBP binding. 
An especially pret'errcd usel'ul agent disrupts the association of a disclosed hTAF, 
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with an activator, panicutarly n viral-speciMc activator, particularly an HIV- 
specific activator such as lai. 

Useful ajicnis arc ibiuul within numerous chemical classes, though typically 
they are organic compounds; preferably small organic compounds. Small organic 
5 compounds have a molecular weiuht of more than 50 yet less than about 2,500, 
preferably less than about 750. more preferably, less than about 250. Exemplary 
classes include peptides, saccharides, steroids, and the like. 

Selected agents may be modified to enhance efficacy, stability, 
pharmaceutical compatibility, and the like. Structural identification of an agent 
10 may be used to identify, generate, or screen additional agents. For example, 
where peptide agents arc idciuilled. they may be modified in a variety of ways to 
enhance their stability, such as using an unnatural amino acid, such as a D-amino 
acid, particularly 0-alanine, by fiinctionalizing the amino or carboxyl terminus, 
e.g,» for the amino group, acylation or alkylaiion, and for the carboxyl group, 
15 esterification or amitlificaiion. or the like. Other methods of stabilization may 
include encapsulation, for example, in liposomes, etc. 

Agents may be prepared in a variety of ways known to those skilled in the 
art. For example, peptides under about 60 amino acids can be readily synthesized 
today using conventional commercially available automatic synthesizers. 
20 Alternatively, peptide (and protein and nucleic acid agenu) are readily produced by 
known recombinant technologies. 

For therapeutic uses, the compositions and selected agents disclosed herein 
may be administered by any convenient way that will depend upon the nature of 
the compound, for small molectiiar weight agents, oral administration is preferred 
25 and enteric coatings may he indicated where the compound is not expected to retain 
activity after exposure \o the stom.nch environment. Generally the amount 
administered will be enipirically deterinincd, typically in the range of about I to 
1000 ug/kg of recipient. 

Large proteins are prelerahly administered parenterally, conveniently in a 
30 physiologically acceptable carrier, e.g., phosphate buffered saline, saline, 

deionized water, or the like. Typically, such compositions are added to a retained 
physiological lluid such as blood or synovial tluid. Generally, the amount 
administered will be empirically determined, typically in the range of about 10 to 
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1000 ^lg/kg of the recipient. Other additives may be included, such as stabilizers, 
bactericides, etc. These adchtives will be present in conventional amounts. 

The followinii examples arc .offered by way of illustration and not by way 
of limitation. 

5 

EXAMPLES 

Additional exemplary materials and methods for the purification, cloning 
and expression of TAI's are described below. Additional exemplary functional 
assays are described in detail. While exemplified primarily for dTAFIIllO, the 

10 disclosed methods find ready application to other TAFs by those skilled in the an 
and femiliar with the meihocls hcreinor found in standard manuals such as 
Molecular Cloning, A Laboratory Manual (2nd Ed., Sambrook, Friisch and 
Maniatis, Cold Spring Harbor), Current Protocols in Molecular Biology (Eds. 
Ausubel, Brent, Kingston. Moore. Scidman, Smith and Struhl, Greene PubL 

15 Assoc., Wiley-lnterscicncc. NY. NY, 1992) 

Tmm»nomirified dTFTID complex necessary and sufficient to mediate SpI 
afljvarion in vitro. 

In order to determine if the TFIID complex is sufficient to substitute for a 
20 partially-purified TFIID fraction, we have purified the TBP-TAF complex 

extensively by using an affinity resin coupled to a specific monoclonal antibody 
directed against TRP. Transcriptionally active TFIID purified from Drosophila 
embryos was obtained by cluiinj: the complex from the antibody affinity resin with 
a low concentration (0.5 M) of guanidine hydrochloride in the presence of a 
25 synthetic peptide corresponding to the epitope recognized by monoclonal 42A11. 
The antibody used for the immimopurification remained bound to the protein G- 
sepharose beads and was found in the pellet. The proteins were electrophoresed on 
an 8 % polyacrylamitle-SDS gel and detected by silver staining. The resultant gels 
reveal seven major TAFs in the complex ranging in size from 30 to over 200 kD. 
30 After dialysis of the purified TFIID complex to remove the peptide and 

denaturant, in vitro transcription reactions were carried out in the presence of basal 
factors that were isolated from Drosophila embryo nuclear extracts (Dynlacht et 
ah, 1991: Wampler ci al.. 1^90). Without the TFIID fraction there is no 
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detectable iranscriprion. Purified, rccombinani dTBP is able lo direct basal but not 
activated transcription. In contrast, immunopiirified TFIID complex is able to 
mediate basal expression \m\ Spi aciivniion. Sp I -dependent activation with the 
TFIID fraction is shown in lanes 7 and 8. For the in vitro transcription assay, 2 ul 
5 of the immimopurillcd TFIID complex was assayed. Transcription was assayed by 
primer extension. The resiilis demonstrate that the immunopurified TFIID complex 
containing TBP and a( least 7 specific TAFs is necessary and sufficient for Spl- 
dependent activation of transcription in vitro. As expected, the impure TFIID 
fraction also mediates transcriptional activation by Spl, while the recombinant TBP 
10 protein is only able to direct bas:tK but not activated transcription. The 

immunopurified complex is also able to support activation by other transcription 
factors such as NTF- 1 . 

Cloning and expression of Drnsonhi l^ TAFl 10 cDNAs 

15 Purified TFIID complex was used to immunize a mouse, and monoclonal 

antibodies were gencraicd against TAFl 10 (see Experimental Procedures below). 
The serum from ihc imnnini/.etl mouse was also collected and polyclonal antibodies 
used to screen a Xgil 1 expression library constructed from Drosophila embryo 
cDNA (Zinn et al., 1988). One clone was tentatively classified as a TAFl 10 

20 cDNA because it produced protein that cross-reacted with independently isolated 
anti-TAFl 10 monoclonal antibodies. This panial cDNA clone was subsequently 
used as a probe to isc^laie rnil-lengih cDNAs from a XgtlG library (Poole et 
al„1985). The longest clone obtained was 4.6 kb. This cDNA is polyadenylated 
at the 3' end and appears to be nearly full-length, based on the size of the mRNA, 

25 as determined by Northern blot analysis. The 4.6 kb cDNA clone contains a long 
open reading frame coding for a protein of 921 amino acids (SEQUENCE ID 
NO:l), with a calculated molecular weight of 99.4 kD and an estimated pi of 10.1. 
Within the predicted amino acid sequence, there are 3 peptides That correspond to 
amino acid sequences determined from lys C peptides generated from HPLC 

30 purified TAFl 10. For microsequencing, the TFIID complex was immunopurified 
from fractionated embryo nuclear extract, and the TAFs were separated from TBP 
and the antibody by eluiion with 1 M gitanidine-HCL The purified TAFs were 
fractionated on a C;4 reverse phase HPLC column. Three adjacent fractions 
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containing TAF 1 10 as ihc major species were cleaved with the protease lys-C, and 
the resulting peptides were purified and sequenced. Three peptide sequences were 
found that match the prcdicicc! ainiito acid sequence of the TAFllO cDNA 

We have expressed TAP 1 10 protein in a variety of cell types. The protein 
5 was expressed from the cloned gene in a baculovirus expression system and 
detected by western blot using a TAF 1 10 monoclonal antibody. The protein 
encoded by the TAF 1 10 cDNA has the same apparent molecular weight as the 
endogenous protein in the TFIID fraction derived from Drosophila cells, and the 
protein produced from ihe cloned gene cross-reacts with monoclonal antibodies 
10 directed against the TAFI 10 protein isolated from embryos. These results taken 
together demonstrate that the 4.6 kb cDNA encodes the full-length TAFllO 
protein. 

TAFI 10 appears to be a single copy gene in Drosophila based on low- 
stringency Southern blot analysis. The TAFI 10 gene is located at 72D,4-5 on the 
15 left arm of the third chromosome. There are not any previously identified 
Drosophila genes assigned to this chromosomal location (Lindsley and Zimm, 
1992). 

Hybridomas producing antibodies against TAFllO were selected by 
screening cell culture siipernatants for those containing antibodies that specifically 

20 recognize the 1 10 kd protein in a western blot. For westerns, approximately 50 ug 
of the TFIID fraction was immunoprecipitated with antibodies against dTBP or 
TAFI 10. The cr-TAFl 10 monoclonal antibody 33G8 was obtained from a 
hybridoma culture medium and purified by binding to protein G-sepharose. 
Proteins were eluied from the resin by boiling in sample buffer, electrophoresed on 

25 8% polyacrylamide gel, and silver stained. Several of the a-TAFllO monoclonals 
that were obtained by this method specifically immunoprecipitate the same set of 
proteins as a-dTBP aiuihodics. This demonstrates that at least part of TAFllO is 
accessible to our antibotlics. and therefore exposed in the native TFIID complex 
and positioned for intcniciion with activators. 

30 Monoclonal antibodies specific for other Drosophila TAFs can also 

immunopiirify (he same TFIID complex as a-TBP and a-TAFllO antibodies. 
Thus, there appears to he one predominant TBP-containing complex in the TFIID 
fraction, as opposed to a heterogeneous .set of complexes containing different sets 
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of TAFs bound to Tnw Our nicihocls are also used lo determine if there are rare, 
perhaps tissue-specific. TBP-coniaining complexes that might contain different 
collections of TAFs or if ihi* aciivir.y of the TAFs could be modulated by post- 
translaiional modifications. For example, TAF200 does not stain as intensely as 
5 the other TAFs and TBP. and. on this basis, might not be present in all complexes. 
However, this protein sclmus io be an authentic member of the major TFIID 
complex since antibodies directed against TAF200 immunopurify a set of proteins 
that appear to be identical lo complexes purified by antibodies directed against 
TBP or other TAFs. The preparations of the purified TFIID complex contain 

10 some polypeptides that are less abundant than the major TAF proteins. Based on 
western analyses with «-TAF antibodies, these minor species appear to be 
proteolytic breakdown products of larger TAFs or substoichiometric TAFs. 

The TAFl 10 coding .sequence contains several regions which are rich in 
giuiamine residues or rich in serine and threonine residues, and the C-terminai 

15 third of the protein is highly charged. The C-lerminal region of the molecule 
contains 32% acidic or basic residues. We searched the existing data bases for 
genes similar to the TAFl 10 gene, and found that it is not highly homologous to 
any previously identified genes. In particular, TAFl 10 did not show any similarity 
to any DNA binding domains. Interestingly, Spl received one of the highest 

20 scores in the NBRF protein sequence data base search for similarity to TAFllO. 
The amino terminal third of TAFl H) has an organization similar to the activation . 
domains of Spl, consisting of gluiainine-rich regions flanked by serine-threonine 
rich domains. The two proteins share 21 % amino acid identity and 35% similarity 
over 260 residues. 

25 This unexpected similarity to Spl prompted us to consider a possible 

functional relationship between .Spl and TAFllO. In particular, whether the 
amino-terminal region ofTAl'l 10 might contain interaction surfaces for activators 
such as Spl, especially since the A and B glulamine-rich domains are responsible 
for mediating Spl -Spl interactions as well as activation. Indeed, one of the unique 

30 properties of Spl activation domains is their capacity to mediate a phenomenon 
called superactivation. in which a truncated form of Spl lacking the zinc fingers 
but containing glutamine-rich domains A and B is able to interact directly . with 
DNA-bound full length Spl. This interaction increases the number of activation 
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domains at the promoter and can greatly enhance expression of a gene regulated by 
Spl binding sites. This type ol' inieraciion also appears to be involved in 
synergistic activation mediaicd hy disially and proximally bound Spl. 

5 ?dTAF110 can fnnciion as a larset for the Snl activation domains 

To test for functional homology between the similar domains, we asked if 
the N-terminal region of TAFI 10 could function as a target for the Spl activation 
domains in a superaciivaiion assay. The amino terminal 308 residues of TAFI 10 
were fused to the DNA binding domain of the GAL4 protein, G4(l-147), and 
10 tested in a transient coiransfection assay in Drosophila Schneider cells. This 
hybrid construct, by itself, weakly activates (4 fold) a reponer gene which is 
dependent on GAL4 binding sites. This low level of activity is similar to the 
modest activation observed with constnicis containing the Spl B domain fused to 
GAL4. When this TAFI 10 hybrid construct is cotransfected with DNA expressing 
15 the gin-rich A and B domains of Spl, (N539), a 60 fold increase in transcription 
is observed. This 15 fold superactivation is dependent on the TAFI 10 sequences 
since Spl(N539) is unable lo stimulate transcription when cotransfected with G4(l- 
147) alone. The interaction with Spl apparently requires an extended region of 
TAFI 10 since GAL4 fusion proteins bearing TAFI 10 residues 1-137, 138-308, or 
20 87-308 arc unable to mediate superactivation by Spl. 

These results indicate that the N-terminal 308 amino acids of TAFI 10 are 
sufficient for mediating an interaction with the glutamine-rich activation domains of 
Spl that lead to superaciivaiion. In the positive control for this experiment, a 
GAL»-SplB domain fusion is supcractivated approximately 50 fold by the 
25 fmgerless Spl mutant. In a search for other potential targets of Spl, we have 
tested some additional members of the TFIID complex for the ability to mediate 
superactivation by Spl. For example, GAL4 hybrids containing TAF40, TAF80, 
or the amino-ierminal region of dTBP were found to be inactive in the 
superactivation assay. This results shows that the interaction between TAFI 10 and 
30 Spl in Drosophila cells is quite specific and that other subunits of the TBP-TAF 
complex that we tested arc imahle to interact with the glutamine-rich activation 
domains of Spl. 

19 



wo 94/17087 



PCT/US94/01114 



dTAFllO and Spl intonKM in vcasi 

The supernctivp.iion assay in Drosophila Schneider cells provided the first 
hint that TAFl 10 may serve as a coaciivaior for Spl. However, it is difficult to 
assess in this assay whcihcr TAFl lU can interact with Spl in the absence of the 
5 other TAFs which arc prescni in Drosophila cells. The superactivation assay also 
imposes certain limitations to the nimiber and types of constructs that can be tested. 
Moreover, it seemed pnidcni to establish several independent assays to investigate 
the relationship between TAFl 10 and transcription activation domains. Therefore, 
we carried out two additional types of assays, one in vivo and one in vitro, to test 

10 the results obtained in .Schnoitler cells. First, we tested the ability of TAFl 10 and 
Spl to interact in a versatile assay for protein-protein interaction which is carried 
out in yeast ceils (Fields and Song, 1989). This strategy takes advantage of the 
modular organization of eukaryotic transcription factors. In this assay, one of the 
partners to be tested is fused to the DNA binding domain of GAL4 and, in a 

15 separate molecule, the other partner is fused to the acidic activation domain 

(AAD). A functional activation domain is recruited to the target promoter bearing 
GAL4 binding sites and the lacZ rc|x>rier gene is expressed only if there is a 
protein-protein interaction between the partners being tested. 

Full-length TAFl 10 as well as a variety of deletion mutants were fused to 

20 the DNA binding domain of riAI.4, G4( 1-147). In contrast to the situation in 
Drosophila cells, the amino terminal region of TAFl 10 cannot activate 
transcription by itself in yeast. This result was anticipated since glutamine-rich 
activation domains have not been ol>scrved to function in yeast. As potential 
partners for TAFl 10. the Spl activation domains were fused to the acidic 

25 activation domain of CiAI.4. [-.ach of the Spl glutamine-rich activation domains A 
or B can independently interact with full-length TAFl 10 as judged by activation of 
the reporter gene. In these cx|)erimcnis, yea.st bearing an integrated GALlilacZ 
fusion were transft>rmed with two plasmids: (I) fusions to the DNA binding 
domain of GAL4 (residues 1-147), and (2) fusions to the acidic activation domain 

30 (AAD; residues 76S-SSI of (iAL4), and the resulting i3-gal activity was measured 
(expressed in units/ mg of protein). Interestingly, domain A of Spl appears to 
interact more enicienily than tloniain R, and this correlates well v^ilh the previous 
finding that A is a better activator for transcription than domain B (Courey and 

20 
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Tjian, 1988). As in Dmsophiki cells, residues 1-308 of TAFl 10 are sufficient for 
the interaction, while regions 1-137 and 138-308 are inactive. The full-length 
TAFllG fusion is more active ihan the N308 construct in this assay. Although this 
effect may be due to diricrciuial prcnein expression, it is possible that the C- 
5 terminal regions of TAPl 10 coniribiiie to interactions with Spl. The protein- 
protein interaction assay in yeasi uiriher supports the idea that TAFl 10 interacts, 
directly or indirectly, with the aciivaiion domains of Spl. and the strength of this 
interaction appears to be correlated with transcriptional function. 

The other TAP proteins that have been tested in the superactivation assay 
10 or the yeast assay (lis|ilayed no deicciable interaction with Spl. However, the 
GAL4 fusion proteins that these ii.ss;iys rely on might not be able to participate in 
all the correct interactions; because some surfaces could be sterically blocked. 
Therefore, additional siraiegies. such as the use of full length Spl, are used to test 
for other potential interactions. 

15 

rfTAFlin does n ni inu»rnct with Olhor nCtivatPrS tgSt^Cl 

To determine whether the interaction between Spl and TAFl 10 is specific, 
or whether other types of activators also interact with TAFl 10, we used the yeast 
assay to test a variety of other activation domains including the acidic activation 
. 20 domain of GAL4 (Ma and Ptashnc. 1987) and the proline-rich activation domain of 
CTF (Mermod et al. 1989). Neither of these two activators displayed any 
interaction with TAFl 10 in the yeast assay. In addition we tested activation 
domains from the Drosophila proteins Antennapedia (Amp) and bicoid (bed), both 
of which are gluiamine-rich. .Surprisingly, both of these glutamine-rich domains 
25 failed to interact with TAPl 10 in the yeast assay. Since TAFllO can interact with 
both Spl domains A and B. which have no significant homology other than high 
glutamine content, but not Amp and bed which are even more glutamine-rich than 
Spl, it appears that glutamine content alone may not be a sufficient criterion for 
the classification of lunciionally similar activation domains. In this regard, it may 
30 be useful to draw a distinction between the Spl activation domains, which are 
approximately 25% glutamine ami Hanked by serine/threonine rich sequences, and 
the bed and Antp sequences, which are partially composed of uninterrupted 
stretches of glutamines and lack adjacent serine/threonine sequences. 
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The N-ierminal region ot'TAFi 10. containing the glutamine-and 
serine/threonine-rich sequences, is able to Uinciion as a weak activation domain in 
Drosophila cells, suuuesiini: that ihis region can interact with a component of the 
native TFIID complex. To deiermine whether the N-terminal region of TAFllO is 
5 similar to the Spl activation domains which can mediate iniiltimerization, we tested 
for TAFllO-TAFI 10 inicraciions. We found that the N-ierminal region of TAFllO 
is able to interact with iiseir as judged by activation of the lacZ reporter gene in 
the yeast assay (figure 6A). This is another example of functional similarity 
between the Spl activation domains and the N-lerminal region of TAFllO, which 
10 can interact with each other as well themselves. 

TBP and other TAFs texicd do nor inieract wit h SnI in veast 

Since Spl synergislicaliy activates transcription through multiple sites even 
though it does not bind cooperatively to DNA, we sought to determine whether 

15 Spl works via interactions with multiple targets or coactivators. We therefore . 
tested two other members of the TFIID complex, TAF40 and TAF80. Similar to 
the superactivation assay in Drosophila cells, neither TAF40 or TAF80 displayed 
any ability to interact with Spl under the conditions of the yeast assay. In 
addition, the consen'cci C-icrminal domain of TBP was tested for Spl interaction in 

20 yeast but no interaction was observed. We were unable to test full-length dTBP in 
this assay because it fiinctions as an activator in yeast when fused to the GAL4 
DNA binding domain. These results show that the interaction between TAFllO 
and Spl is quite specific, and that TAF80. TAF40, and the conserved region of 
TBP do not appear to be targets for Spl. 

25 Since the TFIID complex is also required at promoters that lack a TATA 

box, one of the TAI's miglu he required for promoter recognition through the 
initiator element. In addition to communicating with promoter-selective factors, 
the TAFs interact with each other, at least one TAF interacts with TBP, and one 
interacts with RNA polymerase 11 or one of the basal factors. 

30 

Spl binds dTAFI 10 in vitro 

The superactivation assay in Schneider cells and the yeast experiments are 
both indirect assays for protein-protein interactions. Therefore, we also 
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determined the abiliry of Spl lo hind directly to TAFl 10 in vitro. Biotinyiated 
oligonucleotides containing Spl binding sites were coupled to sireptavidin-agarose 
resin. The resin wns incuhnicd wiih Spl that had been over-expressed and purified 
from HeLa cells infected with a vaccinia virus expression vector (Jackson et al., 
5 1990). After allowing Spl lo hind DNA on the beads, the unbound Spl was 
washed away. Control resin that lacked Spl was also prepared and tested in 
parallel. These resins were incubated in batch with "S-labeled TAFllO synthesized 
in vitro in a reticulocyte lysate. After incubation with the labeled protein, the beads 
were extensively washed and the bound proteins were eluted in two steps with 
10 buffer containing 0.2 M KCI followed by 1.0 M KCl. The 1.0 M salt incubation 
eiutes Spl from the DNA. The input, unbound supernatant, and eluted fractions 
were subsequently analyzed by SDS-PAGE and autoradiography. Samples from 
the binding reaction were also analyzed by silver staining to detect non*specific 
binding of proteins present in tlie reticulocyte lysate. 
15 ^•''S-labeled TAFI 10 synthesized in vitro in a reticulocyte lysate and 

incubated with strcpiaviclin-agarose beads with or without DNA-bound Spl. 
Protein fractions were run on SDS-PAGE and analyzed by autoradiography or by 
silver staining. After allowing TAFI 10 to bind Spl, the beads were pelleted and 
the supernatant containing the unbound proteins was collected. The resin was 
20 washed 4 time. The specifically bound proteins were eluted by incubating the beads 
in buffer containing 0.2 M KCI, followed by 1.0 M KCl. The Spl protein bound 
to the DNA is eluted by treatment with 1.0 M salt. Labeled TAFI 10 protein is 
detectable in the cluicd fractions. No detectable TAFI 10 protein bound to the 
DNA affinity resin in the absence of Spl protein. Quantitation of these results by 
25 analysis of the gel in a Phosphorlmager (Molecular Dynamics) indicate a 60-fold 
greater binding by labeled TAFI 10 to the Spl -containing resin. The silver stained 
gel showed that Spl is the tnajor species in the eluate indicating that the unlabeled 
proteins in the extract are not able to bind Spl. 

These data show that TAFI 10 is selectively retained on the resin containing 
30 DNA-bound SpL but TAFI 10 does not bind the control resin that lacks Spl. Most 
of the bound TAFI 10 cluies with the Spl at 1.0 M KCl with a lower amount 
eluting at 0.2 M KCI. Analysis of the fractions by silver staining indicates that 
Spl is the major protein detectable in the high salt eluate, indicating that the 
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unlabeled proteins prcscni ilie roiiculocyic iysaie. which constitute the vast majority 
of the total protein in ihc input, are not non-specificaliy binding to Spl in this 
assay. To rule out the ptvssihiliiy that an intermediary protein, perhaps some other 
TAF or other eiikaryoiic protein, was required tor the Spl-TAFI 10 interaction, 
5 this experiment was repeated usini! ^'S-labeled TAFl 10 synthesized in an in vitro 
transcription/translation cxtrnct derived from E. coli (Skelly et al., 1987). The 
TAFl 10 protein synihcsi/cd in the prokaryoiic system was also specifically 
retained on the Spl affinity resin providing further evidence that Spi can bind 
directly to TAF 110. 

10 As an additional test of specificity, we also determined if deletion mutants 

of TAFl 10 could bind to Spl in this in vitro assay (mutants are expressed from the 
N-terminal). A 1-137 mutant was not able to bind Spl in vitro* while some 
binding was obtained wiiii :\ 1-.K)8 mutant. Mutants of 308-921, 447-921, and 
571-921 were all effective in binding SpK while C-termina deletions beyond 852 

15 from these mutants climiinticd Spl binding. These results indicate the importance 
of a 852-921 region and ;i l.n-308 region of TAFl 10 in transcription activator 
interaction. 

TAFllO does not directly bind TBP 
20 Our experiments indicate that TAFl 10 cannot directly bind to TBP by itself 

and that at least one additional TAF is required to connect TAFl 10 and TBP. For 
example, a-TAFI 10 antibodies fail to coprecipitate both in vitro expressed 
TAFl 10 and TRP and similnriy with cr-TBP antibody. 

25 Exemplary Exncrimcnial Procedures 
Purification of the TFMD c()mnlex 

For the in vitro ininscripiion assay, the TFIID complex was 
immunopiiritled from the piinially purified TFIID fraction (Q-sepharose fraction, 
0.3 M KCI eliiate) (Dynlacht ci al.. 1991) using the a-dTBP monoclonal antibody 

30 42A11 coupled to protein G-scpharosc (Pharmacia). The immunoprecipitates.were 
washed with 0.1 M KCl-HRMd-ND buffer (25 mM HEPES pH 7.6, 0.1 mM 
EDTA, 12.5 mM MgCI.. 10% glycerol, 0.1% NP-40, 0.1 mM DTT) and the 
TFIID complex was clitied I'rom the antibody by addition 10 mg/ml of the peptide 
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mimicking the epitope o\UM\ (sequence: NH2-RPSTPMTPATPGSADPG- 
COOH) in HEMG burYer containing 0.5 M giianidine-HCl. The eluate was 
dialyzed against 0.1 M KCI-HPiMCi-ND. and then assayed for transcription 
activity. 

5 

Purification of dTAFI 10 

Nuclear extracts derived from approximately 1 kg of Drosophila embryos 
were prepared and fraciionatcd as previously described (Dynlacht et al., 1991; 
Wampler ei al., 1990). For protein sequencing, the TFIID complex was purified 

10 with polyclonal a-dTBP antibodies as previously described (Dynlacht et al., 1991) 
or with a monoclonal aniihocly as described above. The TAFs were separated firom 
TBP by elulion of the protein A-antibody resin with 0.1 M KCl-HEMG buffer 
containing 1.5 niM DTT, 0.1 % LDAO (lauryl dimethylamineoxide), and IM Gd- 
HCl. The TAFs were eluied by batch incubation of the resin with an equal volume 

15 of buffer for 25 inin at 4 T,. This procedure was repeated and the two 
supematants were combined. Urea was added to 8 M, DTT to 10 mM, and 
cysteines were modified with 4-vinylpyridine. 

Two approaches were used to separate the TAFs: HPLC and PAGE. 
Under the HPLC approach, the TAFs were fractionated by reverse phase HPLC on 

20 a 300 angstrom C4 coliunn (2. I X }0 mm). The proteins were eluied with a 

gradient from 20-70% buffer B (buffer A = 0.1% TFA, 1% n-propanol; buffer B 
= 0.1% TFA, I7c n-propanul. 60% isopropanol, 30% acetonitrile). TAFllO 
consistently eluicd at M^7r buffer B. Fractions containing TAFllO (approximately 
5 ^g) were lyophilized. resiispended in 100 mM TRIS, pH 8,0, and 2 M urea, and 

25 incubated at 55 T for 10 min. 150 ng of the protease lys C was added and the 
protein was digested for 20 hr at 37 "C. Peptides were chromatographed and 
sequenced as previously described (Williams et al., 1988). 

Under the gel electrophoresis approach, the TAFs were separated by 
electrophoesis and transicrred to membranes. The separated TAFs were digested 

30 with LysC or trypsin and the restihant peptides eluted, chromatographed and 
sequenced. See Fernandez et al.. ( 1992) Analytical Biochemistry 201, 255-264. 
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Tn vitro transcription assnv 

Transcription t'acior iraciions were reconstituted with basal factor fractions 
derived from 0-12 Iir Drosophila cmhryo nuclear extracts essentially as previously 
described (Dynlachi ei al., 1991) except that TFIIB was separated from 
5 TFIIE/TFIIF and pol II wns rraciionaicd further on a phosphocellulose column. 
Each reaction coniaincci iig of the TFIIB fraction (S-sepharose 0.5 M eluate), 
1.5 ug of the TFIIE/TFIIF fraction (S-sepharose 0.25 M eluaie), and 0.25 mg of 
the pol II fraction (phosphocellulose 0.4 M eluate). Some reactions contained 1.5 
ug of the TFIID fraction or 2 ng of piiritled, recombinant dTBP that had been 
10 expressed in E. coli (Hoey ei al., 1990). The template for the in vitro transcription 
reaction was BCAT (Lillie and Cireen, 1989) containing 3 Spl binding sites, and 
transcription was assayed hy primer extension. 

(^^^neration of antibodies against ihe TAFs 

15 Immunopurified TFIID complex (approximately 10 ug/ injection) was 

mixed with Ribi's adjuvant and injected intraperitoneally into a Swiss-Webster 
mouse at days 0, 7. and 2 1 . The initial immune response was monitored at day 28 
and boosted further by two biweekly injections of more anrigen. After an 
intravenous injection of one further dose of antigen the spleen was dissected out 

20 and electrofuscd with myeloma cells. Approximately 600 supematants from 96-well 
dishes (each well containiuii on average 5 independent hybridomas) were assayed 
on western strip bloi.s for cross-reactivity with immunopurified TFIID complex 
proteins. Hybridomas from wells producing anti-TAF and/or anti-TBP antibodies 
were cloned by limited dilution and tested by Western blotting and 

25 immunoprecipitaiion assays. 

gaging ^fTAFMOcDNAs 

The polyclonal antiserum obtained from the immunization scheme described 
above was used at a i/l(K)0 dilution to screen approximately 5x10* plaques of a 
30 size-selected (> I .tS kb) 9- 1 :hr Ijit 1 1 Drosophila cDNA library (Zinn et al., 
1988). Positive citines were plaque-pitriiicd to homogeneity and tested for cross- 
reactivity against anii-TAI- monoclonal antibodies of known specificity. One clone, 
X106, cross-reacted sironjily with several independent anti-TAFllO hybridomas. 

26 
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Insert DNA i2.(i kh) inm X106 was purihed and labeled using Klenow 
polymerase and raiuloin hcxamcr priininu (Amersham). 10" recombinant phage 
from a cDNA lihrarv (Poole, lm 1985) prepared from 3-9 hour Drosophila 
embryos were screened as previously described (Kadonaga et ah, 1987). 24 
5 positives were obtained in duplicate on the primary screen; 12 of these were 
randomly selected for rescrcening, and 10 of 12 were positive on the secondary 
screen. All 10 of these cDNA clones were found to be related to each other on the 
basis of restriction mapping and cross-hybridization. The largest cDNA clone of 
4.6 kb, XI 10-5, was completely sequenced, and two other clones of 3.1 kb, XI 10- 

10 1, and 2.1 kb. XI 10-2. were partially sequenced. The inserts were subcloned into 
pBS-SK (Straiagene) in both oricniaiions, a nested set of deletions was constructed 
with exonuclease III. and the clones were sequenced by the dideoxy method. The 
XllO-1 clone was found to be M nucleotides longer at the 5' end than the XllO-5 
clone and missing [,5 kb on ihc ,V end. The SEQUENCE ID NO: 1 is a 

15 composite of the XI 10- 1 and XI 10-5 sequence. 

Pypre^sion o f dTAFI 10 nroicin 

An Ndel site was created at the initialing methionine using a PCR based 
strategy. A 3. 1 kb Ndel-BssHII fragment containing the entire coding sequence 

20 was subcloned into the .Smal site of the baculovirus expression vector pVLl392 
(Pharmingen). Recomb'innni baculoviruscs were selected by co-transfection of Sf9 
cells with the expression vector and linear viral DNA as described by the supplier 
(Pharmingen). Samplos for the wostern blot were prepared by infecting SF9 cells 
with recombinant virus ohiaincd from the iransfection supernatant. Three days 

25 after infection the cells were harvesicd. washed, rcsuspended in HEMG buffer, and 
lysed by sonication. The protein concentration was measured by Bradford assay. 
After electrophoresis proioins were iransi'erred to nitrocellulose: TAFUO protein 
was detected using the ntonoclonal antibody ,^E7. 

30 Trnnj;fecrions 

Transfectiim of .Schneider cells (line SL2) was carried out as previously 
described (Courcy and Tjian. \^m\ except that the transfeciions were performed in 
60 mm dishes. The cxpressioti vector lor all proteins used in this study was pPac, 
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which contains ihe l)nisiipiiii;i :Kiin .^c promoter. TAFi 10 sequences were fused in 
frame to GAL4 DNA hiiuimii Uiini;mi. residues 1-147. The following restriction 
fragments of the TAFI 10 cDNA were used: NI37. Ndel-Clal: N308, Ndel-Sall, 
138-308, Clal-Sall; 87-308, Hindi. The consinicis were checked by sequencing 
5 across the fusion jiinciion.s. The amounts of DNA used were as follows: 100 ng of 
the pPacGAL4 dcrivaiives, M) wj, of the pPacSplN539, and 2.5 ug of the reporter 
gene pGSBCAT (l-illie and Cireeh. 1989). CAT assays were performed and 
quantitated as previously described (Courey and Tjian, 1988). 

10 Yeast Methods 

The yeast strain YI.53 (a. pl4, gal80. his3, trpl-901, ade2-10K ura3-52, 
leu2-3, 112, URA3::(iall:lac/.. LY.S2::Gal-His3) was transformed with two 
plasmids according it> the ineilUKi of ShiescI and Gietz (Schiesil and Gieu, 1989). 
The Gal4 DNA hindinj! tloniain hybrids were constructed in the vector pASl. 

15 pASl is a 2m piasmiti with T\i\' selection that expresses fusions to Gal4(l-147) 
from the ADH promoier. i'or c.xpa'ssion of GAL4(1-147), an Xbal linker 
containing stop cixlons in all three reading frames was inserted in pASl 
immediately down.sircam of the CiA!-4{l-l47) coding sequence. G4-1 10 (fl) 
contains the entire coding region of the TAFI 10 on an Ndel-BssHII fragment, and 

20 . the shorter G4-1 10 fusions contain fragments as described for the Drosophila 
expression vectors. (14-80 (H) contains an Ndel-Xbal fragment that includes the 
entire coding region of Drosophila TAF80. G4-40 (fl) contains an Ndel-EcoRV 
fragment encoding Drosophila TAF40. G4-dTBP( 19 IC) contains an Ndel 
fragment derived from pAIMOIC containing the conserved C-terminal domain 

25 (Hoey et al.. 1990). The reading frame across all fusion junctions was verified by 
sequencing, and the protein expression was verified by western blot analyses with 
either a-TAF or fr-C;AI.4 :iiuilu>(lies. with the exception of G4-1 10(N137). 

The acidic aciivaiion donKiin fusions were constructed in the vectors 
pGADlF, p(jAD2l' or pCiAD.M* which differ only in the reading frame of a unique 

30 Bam site (Chien et ai,. IWl). These 2^ plasmids with LEU2 selection express 
fusions to activating region II (residues 768-881) of GAL4 from the ADH 
promoter, Spl regitvi A consists of amino acids 83-262 and Spl region B consists 
of residues 263-542: these were citwd as BamHl-Bglll fragments from the 
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plasmids pKSARglO and pKSl^G respectively. The C-terminal 100 amino acids of 
CTFl (residues 399-499) were cloned as a Bglll-EcoKI fragment (Mermod et al., 
1989). The Antp consinici was made by suhcloning a BamHI fragment containing 
the activation domain (Coiirey ct al., 1989), Bed residues 249-489 (Driever ei 
5 al., 1989) were cloned on a Sail fragment derived from pPac-bcd. The reading 
frame across all fusion jimciions was verilled by sequencing. 

Transformed yeasi were assayed qualitatively after growth on media 
containing X-gal. .Ounniiiaiive ll-galaciosidase assays were performed as described 
(Himmelfarb ei al., 1 990) excepi cells were grown to mid log in selective media 
10 containing 2% glucose. Assays were performed in triplicate and activity is 
expressed as units/mg of loial protein. 

Tn vitro protein-nrotein imemciion assay 

A 3.1 kb Ndel-BssHII fragment containing the entire TAFllO coding 

15 region was subcloncti into the plasmid pTbSTOP (Jantzen et al., 1992), which 
contains the b-globin untranslated leader downstream gf a T7 promoter. The 
plasmid was linearized with XbaL and the gene was transcribed in vitro with T7 
RNA polymerase. *\S-met labeled protein was synthesized in vitro in a rabbit 
reticulocyte lysaic (IVomega). Alicrnaiively. TAFllO was synthesized in vitro in 

20 an E. coli derived S30 transcription/translation extract (Skelly et al., 1987). Spl 
protein was ovcrexpressed in HeLa cells using a vaccinia virus expression vector 
(Jackson et al., 1900) anil purified by wheal genn agglutinin (WGA) affinity 
chromatography (Jackson and Tjian 1990), prior to DNA affinity purification as 
outlined below. 

25 DNA affinity resin was prepared as follows: 5'-biotinylated 

oligonucleotides coniainini: 4 ,Spl hituling sites, GCACAGGGGCGGGGCOjT and 
its complement, were annealed anti coupled to streptavidin-agarose beads (Pierce) 
by incubating overnight ai room temperature. The beads were incubated with 
WGA-purificd Spl in huffer // (25 mM HEPES, pH 7.6, 20% glycerol, 0.1% 

30 NP-40, 10 mM ZnSO,. I inM ITIT) containing 0.1 M KCl for 2 hours at 4 °C. 
Spl was bound to the resin ai a concentration of approximately 1 mg/ml of beads. 
"S-labeled TAFI 10 was incuhaial in hatch with 15 ml of the DNA affinity resin in 
Z'+ 50 mM KCl. with or without Spl . for 4 hours at 4 "C. The beads were 
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washed 4 times with i ml oi ihc s:ime InilYcr. and eliued with + 0.2 M KCU 
followed by Z' + l.OMKCI. Ihc cliiial proieins were TCA-precipiiated and 
analyzed by SDS-iV\Cii:. i^cibro auioriulioiiraphy, the gel was fixed and treated 
with Amplify (Anicrslunn). 

5 

Detection of Direct TRP/TAF IniiTactidns nn Protein Blots 

Immunopurification of ihc Drosophila TFIID complex using anti-TBP 
antibodies results in ihc purificaiion of a large mulliprotein complex consisting of 
TBP and 7 major TAFs. To idcniify TAFs which can bind directly to TBP we 

10 probed a blot containing rcnniiircd TAFs with a 32P-labelled TBP-GST fusion 
protein. After washing off iinlnnnui TBP-fiision protein and exposing the blot to 
X-ray film a strong signal was SL*cn which coincided with the position of 
dTAFII-250K on ihe gel. Fiirihcr experiments revealed that a truncated version of 
TBP, conS(isting of the highly conscn'cd C-lcrminal domain, is sufficient to mediate 

15 this interaction. \Vc also icsiod oilier fractions containing basal factors (Wampler et 
al., 1990: Dynlachi cl a!.. including TFIIB, E/F and RNA polymerase II, 

and failed to detect specific signals. We conclude that TBP and TAFII-250K 
interact directly and that TAril-250K is present in the TFIID fraction but not 
associated with TFIIB, E, F or RNA polymerase II. 

20 

Molecular Cloninn and rhamcterizaiion of the dTAF!N250K Gene 

Having ideniiiled tl'f AI-II-250K as a candidate for a direct TBP-TAF 
interaction we decided lo elune ihe corresponding gene. The low abundance and 
large size of dTAI"(ll)-2.^()K disfavours cloning strategies based on protein 

25 microsequencing. Insiead. we were able to obtain monoclonal antibodies which 
specifically (and exclusively) recognize dTAF(lI)-250K on Western blots. To show 
that dTAF(II)-25{)K is indeed a genuine component of the TFIID complex, we 
used two of these nionnclonal aniihodies. 2B2 and 30H9, lo carry out 
immunoprecipitaiions from ihe TFIID fraction. The pattern and stoichiometry of 

30 TAFs and TBP is indisiiiiguishahle from the ones described previously using either 
anti-TBP (Dynlachi ei al.. IWI) or ami- dTAF(II)-l lOK (Hoey ei al., 1993) 
antibodies. We cloned the gene encoding the Drosophila dTAF(II)-250K by 
screening a Igll I expression library prepared from 6-12 hour old embryos (Zinn et 

M) 
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al., 1988) with hybridonui supernatanis containing either 2B2 and 30H9 
anti-dTAF(II)-25()K monoclonal antibodies. Five partial cDNA clones were 
obtained, which all cross-hybriciized with each other at high stringency. Restriction 
mapping and sequence analysis conllrmed that they were indeed derived from the 
5 same gene. Two o\ iliL^se cDNAs. ID-1 and ID-2, allowed us to establish a 
composite open reading iVamc spanning 4.5 kb (ilg. 2). Attempts to isolate 
additional cDNA clones enciuling N-ierminal regions of dTAF(ll)256 or 5'-RACE 
experiments have so far been unsuccessful. Genomic DNA sequencing allowed us 
to extend the open reading frame by approximately J kb before encountering 

10 noncoding (prcsumanhly inironic) sequences. Inspection of the open reading frame 
encoded by the cDNA clones reveals a protein .sequence which displays an 
extensive similarity lo the hmnan Tell Cycle Gene T (CCGl) gene previously 
described by Sckiguchi ct a!.. IWI. Many of the sequence elements defined in the 
CCGl genes arc also present in the dTAF-250K encoding sequence. Interestingly, 

15 however, we detected a 35 amino acid insertion in the region which Sekiguchi et 
al. putatively identified as an HMG box. This insertion causes substantial 
disruption of the spatial alignment with the consensus sequence. We also used the 
lD-2 cDNA fragment lo map the dTAFIl-250K gene to position 32EI-2 (left arm 
of chromosome I!) by in silii hybridization. This location does not contain any 

20 previously characterized genes and currently no deletions spanning that regions are 
available. Since dTAr-250 seems to be present in all or the majority of theTFIID 
complexes present within cells aiul seems to provide essential contact points with 
,TBP and TAFs {see below) we expect that a deletion of the 32E1-2 locus would 
cause a lethal phenotype. 

25 

Hxnression of the C-ierminal ckunitin of (ITArnn-25QK in Insect Cells 

To study the functional properties of the proteins encoded by these cDNAs 
we decided to ex|>ress the protein encoded by the reading frame of our longest 
cDNA, lD-1. Because of the expected large size of the protein encoded we chose 
30 the baculovints system. After sul)cloning of the fragment into expression vector 
pVLI393 and iransfeciing the construct into SI9 cells we detected expression of a 
180K protein (siil)set|uenily reierred to as DN250) which cross-reacted strongly 
with several ami- rAI'25() monoelonal antibodies recognizing a variety of epitopes 
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in different pans of ihc 2^()K TAi". Wc deiecied no cross-reactivity between our 
antibodies and any onclouenmis Sptidopicra 'rAF250 homologs wliich might be 
present in St9 cells. 

5 The C-terminal Domain of ihc dTAF(ll^250K Is Suf ficient for T BP Binding 
To study whciiicr DN250 was capable of interacting with TBP we 
immunopurificd ihc protein iVom infected cells. Monoclonal antibody 30H9 was 
bound to protein A or (i heads and incubated with extracts from baculovinis 
infected cells. Under these conditions DN250 is specifically immobilized on the 

10 beads. After washinir olT unbound material we added an extract containing 
partially purified TRP (also expressed in the baculovirus system). TBP was 
specifically boimd lo beacLs carrying the immunopurified tAF250-C180 protein 
whereas beads containing antibody only failed lo do so. Further evidence for this 
direct TBP-TAF interaction by carrying out protein blots. The ability of a protein 

15 representing appr. 60% ol" the rull-length 250K protein to bind TBP demonsirates 
conclusively that the cloned C-terininal part is sufficient for TBP binding. 

Gelshift Analysis of the DN25nrrnP rnmniex 

TBP is the only component of the general transcriptional machinery capable 

20 of sequence-specific binding to the TATA box. We therefore were interested to see 
how interaction of TBP with TAF250-CI80 affected the specificity and affinity of 
DNA binding. TRP was added to a .^2-P labelled DNA fragment containing the -33 
to +55 region of the adenovirus niaior late promoter and DNA-binding was 
monitored using a gelshifi assay. The intensity of probe DNA shifted by TBP 

25 increased sitbstantially hi presence of purified TAF250'C180 wheras TAF250-C180 
alone did not dctcctably bind ti» DNA. To investigate whether this enhanced 
affinity of the TnP/TAF25()-(*ISU complex for DNA was due lo additional contacts 
with DNA provided by the *rAF2.5()-CI80 protein we carried out footprinting 
studies, again using the adenovirus major later promoter region as a probe. 

30 
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TAF250 and TAFI lO S|XTillcnllv Interact With IZnch Other, even in Absence of 
TBP 

Since \vc h:ivc ndi observed *any ot' the cloned Drosophila TAFs to bind to 
TBP we investigated whether tlicy would interact with the TBP/ d250KdeltaC180 
5 complex. 35S-labclied 1 lOK protein (Hoey et al., 1993) was synthesized in an 
in-vitro translation system and incubated with TAF250-C180 protein in presence 
and absence of Tni\ As shown in llu. 5 we found that the I lOK TAF binds 
specifically to dTAF(II)25()K-CIS0 in the presence and absence of TBP thus 
indicating that the two proteins bind independently to two distinct domains within 
10 the 250K TAF. The alTiniiy and specificity of this interaction is sufficiently high to 
allow selective puritlcaiion of TAFI 10 from a crude baculovirus extract expressing 
the recombinant protein by using TAF250-C180 immobilized on beads. 

Protein Blot Analysis 

15 pGEX-2TK was linearized with Smal, phosphatase-treated and the ligated 

with gel-purificd Ndel fniiinients of either pARdTFIID or pARdTFIID-191C (Hoey 
et al.. 1990). Generation of 32-P labelletl GST fusion protein, protein blotting and 
hybridization were carried out essentially as described in Kaelin el al., 1992, 

20 Generation of anii-dTAFII- 2.S0K Hvhririnma Cell Lines 

The momKlonal antibodies described in this study were derived as 
described in Hoey et al.(|i^)3). Hriclly. a Swiss-Webster mouse was immunized 
with intact immiinopuriried Drosophila TFIID complex. After fusion hybridoma 
supematants containing anii-dTAl"ll-250K antibodies were selected using stripblots 

25 containing SDS-gel-scpanited TBP and TAFs. Two such cell lines, 2B2 and 30H9, 
were then cloned to honioiiencity by limited dilution. 

, Isolation of dTAFM-25()K cDNA and Cicnomic Clones 

Approximately 5xKI5 independent plaques of a size-selected (>= 1.8kb) 
30 Drosophila Igtl I library preparetl iVom Drosophila embryos (Zinn et al., 19..) 
were screened with two independent anti-clTAFII-250K monoclonal antibodies, 2B2 
and 30H9. All Ihc positives identified cross-hybridized at high stringency with each 
other on the DNA level. Kcstriction mapping and sequence analysis showed that all 
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of the clones were derived iroin die same gene. cDNA clones IDl and 1D2 
contained inserts o\ 1.^ :iiul 4.U kh. respectively, and were sequenced to 
completion. ID2 was lound id extoMul 500 bp r'linher towards the 5' end of the gene 
and was used to isolate ucnoniic clones IDASH3 and IDASH4 (Sau3A partially 
5 digested DNA cloned into tOASH), 

gequencins Strategy 

We employed the gtl trnnsposon-direcied sequencing strategy (Gold 
Biosysiems) as described in Siraihmann el al.. 1991. DNA fragments of interest 

10 were subcloncd into the plasmid vector pMOBl and electroporated into DPWC 
cells. After conjugniion wiih the recipient host BW26 the mixture was plated out 
on kanamycin/carbcnicillin plates. Transposon insertion points were mapped by 
PGR. Clones with tlic desired ir:ins[xison locations were then grown up and 
sequenced using iransposon-specillc primers with 3-*)S-dATP or the Pharmacia 

15 A.L.F. Sequencer. 

Expression of a Truncated Version of dTAFll-250K fDN250) in the Baculovinis 
System 

cDNA #5 was inserted into the EcoRI site of Baculovirus-expression 
20 plasmid pVLI393 (Pharmincen). The resulting construct was co-transfected with 
'BacuIoGokl' viral DNA (I'hanningen) into SI9 cells. After 3 days cells were 
harvested and expression oi' the ON250 protein was monitored by Western blotting 
using the anii-dTAPII-Z.'^OK monoclonal nniib(Hly 2B2. The recombinant 
virus-containing supernatani was usal to infect large scale cultures of Sf9 cells. We 
25 typically prepared whole cell extracts from 1 liter of plate cultures of infected 
Sf9-cells by sonicating them in HnMO-ND/O.I M KCl (HEMG-ND contains 
25mM HEPES. pH7.6. mM MgCi: O.I mM EDTA. 0.1 % NP40, 1 mM PMSF, 
1.5 mM DTT. 5ni2/ml leiipepiin). The supernatani was partially purified 
(approximately 5 fold) hy chromatography over Q-sepharose (Pharmacia) with step 
30 gradient elution (M!;M(i containing 0. 1 M. 0.2. 0.4 and l.O M KCl, respectively). 
dTAFII-250K(CIH())elitied in the 0.4 M step(*0.4' flection). After dialysis against 
HEMG-O.IM KCl the exiraci was frozen in aliqiiots and used for the 
immunopurirlcation/coprccipiiation studies. 
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rnimmunoprecinliniio n Sunlics 

Protein Ci-bcacls were prclo:ilHlecl with monoclonal antibodies and incubated 
with various cell exiracis ironi liaciilovirus-inrected ceil fraciions or 35S-labelled 
dTAFIIl 10 prepared by in viiro iranslaiion. After 45 minutes on ice. unbound 
5 protein was removed wiih several washes with HEMG-ND. 

hTAFU250 piirificaiion and cloning 

We preyioiisly reported the isolation of hTFIID by affinity chromatography 
using antibodies specific to rnP. The purified complex contains at least seven 

10 distinct TAFs ranging in molecular weight from 30-250 kD which copurify with 
TBP. We were particularly interested in characterizing the 250 kD species because 
this subunit of TFIIH appears lo bind TBP directly as determined by Far Western 
analysis. Using alfiniiy-purificd TAI-s to immunize mice, we generated both 
polyclonal and monoclonal antihmlies that crossreact with different TAFs. We 

15 used these antibodies to screen Igtl I e,xpression cDNA libraries and several clones 
were isolated, including IHI which contains a 1.1 kb insert. To determine which, 
if any, TAF is encoded by IHI . we expressed this cDNA as a GST fusion protein, 
purified the tagged protein by glutathione affmity chromatography, and raised 
antibodies against tiiis recombinant protein. Antisera directed against GST-lHl 

20 specifically cros.srcacic(l with ihc 250 kD TAF. indicating that a portion of the 
gene encoding hTAril2.'>() had been isolated. 

Next, we deierminal the DNA sequence of IHl and discovered that this 
open reading frame is related lo the previously identified human gene. CCGI, 
which had been implicated in cell cycle regulation. Specifically, a 

25 temperature-sensitive miiiani hamster cell line. isl3. is arrested at Gl a few hours 
before entering .S phase ai ihe non-permissive temperature. Expression of human 
CCGI in tsl3 overcomes iliis cell cycle block. .Since IHI only encoded a small 
portion of hTAFIi:5(). we isolaietl several additional clones from a primary HeLa 
cDNA library, including IH2. which contained a 5.3 kb insert. The construction 

30 of a full-length hTA FII25() cDNA revealed the predominant hTAFII250 RNA 
species charactcrircd in l id .a cells encodes 21 additional amino acids between 
residues 177 and I7S relative to CCGI. Interestingly, we sequenced several other 
cDNAs containing internal insertions or deletions when compared to CCGI. This 

35 



BNSOOaO: <W0 941 70a7A1 J_> 



wo 94/17087 



PCT/US94/01114 



finding suggests ihai muliiplc hTAFiI250-relaiecl proteins may be generated by 
alternate splicing of a primary iranscripi. 

Although the Uncling ihnt a cDNA isolated by antibodies directed against 
TAFs encodes a cell cycle gene is exciting, it was important to provide some 
5 functional evidence that this clone indeed encodes a bona fide TAF which is a 
subunit of TFIID. We first asked whether the recombinant hTAFII250 expressed 
in a vaccinia vims system becomes associated with the endogenous TFIID complex 
in HeLa cells. To distinguish between the recombinant and endogenous protein, 
we engineered a version containing a hemagglutinin antigen (HA) epitope at the 
10 N-terminus of hTAFII25(). Antibodies against TBP were used to immunopurify the 
TFIID complex from HcLn ceils infected with either recombinant or control 
vaccinia vims. The immunopuri lied complexes were subjected to gel 
electrophoresis and analy7.ed by Western blot analysis using either a monoclonal 
anti-HA antibody to detect the HA-tagged molecule or monoclonal antibody 6B3, 
IS raised against the endogenous h TAPII230. The anti-HA antibody crpssreacted 
specifically with a 250 kO protein only in the TFIID complex prepared from 
recombinant hTAFII250 vims infected HeLa cells but not control infected cells. 
As expected, 61)3 recognized both the recombinant hTAFII250 and the endogenous 
protein. Thus, we conclude that the recombinant hTAFII2S0 associates with TBP in 
20 vivo and is part of (he TFIID complex. 

To lest for a direct interaction between hTAFII250 and TBP, we performed 
a Far Western analysis with nuliolahelled TBP and antibody immunopurified 
HA-tagged hTAFn25n. The full-length hTAFII250 is capable of interacting 
directly with TBP in vitro, even in the iibscnce of other TAFs or coactivatprs. 
25 These results and the analysis of the indeper^dently cloned Drosophila TAFII2S0 
suggest that this largest TAP is responsible for the initial assembly of the TFIID 
complex by binding directly to T\W and other TAFs. 

The imponani role of h rAFII23Q in the formation of a TFIID complex 
prompted us to define more precisely its interaction with TBP. For these studies 
30 we employed the two hybrid system carried out in yeast cells. Using this rapid 
and convenient assay for proieinrprotein interactions, we observed that a hybrid 
constmct containing hTAril250 fused to the DNA binding domain of GAL4, 
G4(l-147), interacted selectively and efficiently with human TBP attached to the 
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acidic aciivaiion ckimain oi (iAL4. (^4(768-88I). Yeasi expressing both of these 
proteins produced hiyh levels of l)-iialaciosidase due to increased transcription of a 
lacZ reporter consirucL coniaiiiini! GAL4 binding sites. Interestingly, hTAFII250 
also interacts er'tlcienily wiih a truncated version of human TBP which contains 
5 only the conserved C-ierniinal ISO amino acids. By contrast, a construct 
containing the "species-spccillc" N-tcrminal domain of human TBP failed to 
interact with hTAFII250. These results are in agreement with Far Western 
experiments using radiolaheiled cTBP and nTBP as probes and suggest that 
residues 160 to 339 on the outer surface of TBP may be responsible for hTAFn250 
10 binding. 

Our unexpected finding thai hTAFI1250 is related to CCGl suggests a 
rather intriguing link between a subunit of TFIID and expression of genes involved 
in cell cycle control. Interestingly, CCGl is a nuclear phosphoprotein with several 
domains characteristic of transcription factors including a putative HMG-box and a 

15 proline-rich cluster. Based on these structural motifs, Sekiguchi et aL suggested 
that CCGl might work as a sequence-specific transcription factor needed for 
regulating genes involved in the progression through Gl. However, it now seems 
clear that CCGl or a related product is part of the TFIID complex and is not a 
promoter-specific transcription factor. Therefore, it seems more likely that the Gl 

20 arrest in tsl3 is due to the failure of a defective TFIID complex to mediate 

activation by a subset of cellular transcription factors that govern cell cycle genes, 
e.g. thymidine kinase and dihydrofolate reductase genes. The presence of a 
putative DNA binding domain, the HMG box, inay suggest that once hTAFn250 
forms a complex with TBP. some portion of this large subunit of TFIID may 

25 contact DNA, perhaps downstream of the initiation site. 

Immunoaffinitv purilled hTnil) complex: Intera ction with hTBP and p roduction of 
hTAFs-snecific antibodies 

A. Immimoprecipitatinn reactions were carried out according to a modified 
30 version of previously described procedures (Tancse et. al.). 0.5 mg of affinity 
purified a-hTBP antibody was added to 200 mg of hTFIID (phosphocellulose 0.48 - 
1.0 M KCO fraction, and the mixture nutated for 2 - 4 hrs at 4oC. Protein A 
Sepharose was then added and nutation continued for an additional 2-4 hrs. 
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Aniibocly-amiiicn complexes were pelleted by low-speed centrifugation, 
washed four limes wiih 0,1 M KC! - HEMG (25mM Hepes, 12.5 mM MgC12, 0.1 
mM EDTA, 10% iilvcerol) coniaining 0.1% NP-40and ImM DTT. The 
immunoprecipitaied KITUD complex was subjected to 8% SDS-PAGE and silver 
5 stained. For Far Wesicru analysis, ihe proteins were blotted onto nitrocellulose 
membrane and hybridi/.cd with 35S-labcled hTBP (Kaelin eial.). pTbhTBP was 
used to in vitro transcribe hTRP RNA which was in vitro translated using 120 mCi 
35S-meihionine ( > 1000 Ci/mMol. Amersham) in reticulocyte lysate (Promega). 
B. Antigen used to immuni/e mice for antibody production was prepared as 

10 follows. The immimoprecipiiaied hTFIlD complex, purified from 250 litres of 
HeLa cells, was eluted from the Protein A Sepharose - antibody complex with 0.1 
M KCl - HEMG containini! 1 M guanidine - HCl, 0.1% NP-40, and 1 mM DTT. 
Under these conditions TBP remained bound to the antibody. The eluted TAFs 
were dialized against 0. 1 M KCl - HEMG containing 0. 1 % NP-40 and I mM 

15 DTT. The mixture of proteins containing 1-2 mg of each TAP was used to 
immunize a mouse. Test bleeds were taken and the immune response monitored 
by Western blot analysis. After a series of five boosts, the mouse was sacrificed 
and the spleen was used for the production of monoclonal antibody producing 
hybridoma cells lines. The ideniification of hybridoma cell lines producing hTAF 

20 specific antibodies was determined by Western blot analysis of eluted TAFs, 

rinninf and identilicaiit>n of the 2 Sn kO snbunit of hTFIID complex as CCGl 

A. An expression screen of 2,4 x 106 PFU from a Igtll HeLa S3 cDNA 
library (Clonrcch) was carried out using the a-hTAFs polyclonal serum described 

25 above. 38 primary sianals were identified of which 6 were plaque purified. 1 
phage DNA was prepared and analyzed by EcoRI restriction enzyme digestion. 
IHl contained a 1.1 kh insert which was subcloned into the EcoRI site of pGEXl 
(Pharmacia) to express a GST-IHI fusion protein. The resulting construct was 
transformed into Escherichia coll TG2, and following induction with 0.5 mM 

30 IPTG, the induced protein was purified on glutathione Sepharose 4B beads 

(Pharmacia). 2 mi: (per iniection) of the fusion protein was used to immunize a 
mouse. Test bleeds were taken and used for Western blot analyses. 
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B. Poly(A)+ UNA \\\m HcLa cells was used for construction of a 
directional cDNA library in IZAIMl (Siraiagene) as described previously (Ruppen 
etal. 1992). Using a rantltimly 32P-lnbolcd probe.derived from the IHi cDNA 
insert, 15 independent cDNA clones were isolated from 1.2 x 106 original PFU. 
5 The cDNA inserts were rescued by ilie zapping procedure (Short et al.) and 

characterized extensively by rcsiriciion enzyme ianalysis and Southern blotting. The 
longest cDNA clone isolated from IH2 contains a 5.3 kb insert, revealing an 
extended 3* untranslated region but missing about 1.15 kb of 5* sequences when 
compared to CCGI. This 5' region was generated by PGR using conditions 

10 described previously (Riippcrt ct al.). Two set of PGR primers were designed 
according to the GCGl cDNA sequence (Sekiguchi et al). PGR-I, forward primer 
#1: 5'-TAnTGGGGCATATGGGAGGGGGGTG-3' (position 40 to 65, containing 
an engineered Ndel restriction site at the translation start codon) and reverse 
primer #2: 5'-GAACiTGCACTTTGTCACCAG-3' (position 578 to 597). PCR-II, 

15 forward primer #3: 5'-TAGGAC;GAGCATATGGGGAGGTTGCAG-3' (position 
421 to 447) and reverse primer ^4: 

5'-GCTCTAAGGAAGCCAC;CCTGCCAGGCTrG-3' (position 1343 to 1371). All 
PGR products were subcloncd imo pBluescript KS (Stratagene) and sequenced. 
The most abundant product of PGR-tl, a I kb fragment, included a 63 bp in frame 
20 insertion .while a minor 330 bp fragment revealed a 618 bp in frame deletion with 
respect to the CCGI cDNA. To generate a full-length hTAFII250 cDNA, the 
product of PCR-I and the I kb PCR-II product were joined via the shared Smal 
restriction site. Suhscqucntly the 1.2 kb Xbal fragment of the resulting plasmid was 
cloned into Xbal cui pH2 to generate the full-length cDNA clone phTAFII250. 

25 

Analyses of hTAFH250 and hTRP inicraction 

A. To construct an HA-iagged version of hTAFII250 we generated a 
plasmid, pSK-HAX. coniaining the hcinaggluiinin antigen (HA) epitope, factor X 
cleavage site, «ind in fmmc Ndol cloning site. A 6.3 kb NdeI/Asp7I8 fragment 
30 from phTAFn250 was inserted into p.SK-HAX to generate pHAX-hTAFII250. A 
6.0 kb Spel fragment thereof cotiiaining tiic complete coding region of 
hTAFII250,was inserted into the Xbal site of the vaccinia virus expression vector 
pAbT4537 (Applied bioTcchnoltiyy Inc.). Extracts from recombinant virus, 
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vhTAFII250, or conirol virus (New York City Board of Health strain of vaccinia 
virus) infecied HcLa cclh (i)ynlachi 1989) were fraciionaied by phosphocellulose 
chromatography as dcscrihccl crancsc el al.). hTFIID complexes from the 0.48 - 
1.0 M KCI fraction were iinnuinoprecipiiaicd with affinity-purified a-hTBP 
5 antibodies, subjecicd lo S'.r SIDS-PAGE and analyzed by Western blotting. 

B. To gcnenuc an HA-tatigcd version of hTAFII250 in the baculovirus 
expression system, we first generated new baculovirus vectors, pVL1392HAX and 
pVL1393HAX, derived from pVLI392 and pVL1393 (Pharmingen), respectively. 
These vectors contain the HA antigen epitope, factor X cleavage site, and unique 

10 in frame Ncol and Ndcl restriction sites. A 6.0 kb Ndel/Spel fragment from 

phTAFII250 was inscried inio pVU392HAX creating pbHAX-hTAFII250, Whole 
cell extracts from c'whcr SFO colls or SF9 cells infected with recombinant 
baculovirus were prepared in 0.4 M KCI - HEMG (including 0.04% NP-40, 1 mM 
DTT, 0.2 niM AF-R.sr. 0. 1 mM NaMBS) and used directly for 

15 immunoprecipiiation wiih the a-IIA antibody. The precipitate was subjected to 8% 
SDS-PAGE and blotted onto nitrocellulose membrane. The filter was probed first 
with 35S-labeled hTBP. and subsequently with the monoclonal antibody 6B3. 

hTAFn2S0 interacts with hTBP in veast 

20 hTAFn250, fu.sed to the DNA binding domain of GAL4 (residues 1-147), 

was constructed by inserting a 6.0 kb Ndel/BamHI fragment derived from 
pvhTAFn250 into ihc pA.SI vector. The activation domain fusions were obtained 
by cloning inserts inio the pCiADIF vector (Chien et al.). The hybird proteins 
generated included the acidic activation domain of GAL4 (residues 768-881) fused 

25 to either full-length , residues U»()-3.'^9. or rcsidties 1-159 of hTBP. The above 
described constructs were iransformed into the yeast strain YI53 (a, gal4, gal80, 
his3, trpl-90i. ;ule2-i()l. uni.V52. leu2-3, 112, URA3::Gall:lacZ, 
LYS2::Gal-His3: as described (Cliien et al.) and b-galaclosidase assays performed 
according to published procedures (Hocy et al). 

30 

nmfflphila TRP and flTAI-11250 in iiL-mct with the C-tcrminal portion of dTAHIlSO 
Radiolabeled in vitro translaied dTAFIIlSO boimd efficiently to immobilized 
HA-dTBP or dTAFII2.^()^N (see Weinzierl el al (1993) Nature 362, 511-517). In 
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contrast, dTAFIIl 10 and other TAFs failed to interact selectively with dTAFIIlSO, 

showing that dTAFIII5() internets with at least two siibunits of the TFIID complex, 

dTBP and dTAFIIZM). u hieh niso coniaci each other. 

We also carried out in vivo experiments in which insect Sf9 cells were co- / 
5 infected with two rocomhinnnt haciilovinises, one expressing dTAFIIlSO and the 

second expressing cither TBP or one of the other TAFs. Complexes were 

subsequently immiinopurillcd from cellular lysates and analyses by SDS PAGE 

followed by immiinohloiiing using antibodies directed against dTAFIIlSO. 

Coinfeclion of virus expressing dTAFIIlSO and either HA-dTBP or dTAFICSOAN 
10 resulted in efficicni formation and copurillcation of heteromeric complexes, 

SimUarly, full-length hTAFII250 bound efficiently to dTAFIIlSO. 

Radiolabeled in vitro translated C-terminal 369 residue portion 

(dTAFIIlSOAN) of this protein binds TBP and dTAFII250AN with the same 

effenciency as ihc full length protein. No significant binding of a N-terminal 786 
IS residue portion (dTAFII I50AC) was observed: i.e. the interaction interfaces from 

these proteins are IcK'ated in the CMcrminal portion of dTAFIIlSO. 

TSM-1 associates with TRP niul TAFII250 

Like dTAFIIlSO. TSMIAN (C-terminal 920 residue ponion) bound 
20 efficiently to yTBP as well as HA-dTBP: hence we conclude that yeast contain a 
TAFII250 and TSM-I is a TAF. 

The activation domain of the Drosophila regulator NTF-1 (Neurogenic 
Element Binding Transcription Factor- 1) interacts with dTAFIIlSO. 

NTF-I immuno-copurifics with dTFIID using anti-dTBP, indicatin that one 
25 or more subimiis of the dTFIID interacts directly with NTF-1. Using 

coimmunoprecipiiation experiments: dTAFIIlSO was immunopurified from Sf9 
extracts containing dTAFIIlSO, the immobilized TAF was mixed with recombinant 
NTF-U the isolated complex was analyzed by SDS-PAGE, and the presence of 
NTF-I was detected by protein ininuinoblot anaysis, showing that NTF-1 directly 
30 interacts with dTAFIIlSO. 

Next wc usal ;i (iST-NTF-l fusion protein containing the N-terminal 284 
amino acids of NTI'-I to hintl various truncated bersions of dTAFIIlSO, showing 
that the N-terminal, hut mn ihc c:-tcrininal region of dTAFIIlSO bound to the N- 
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terminal extended aciivaiinn domain of NTF- 1 . Neither dTAFIISO nor dTAFIMO 
bound significantly under these conditions. 

Using an afllniiy resin eoniaininc a covalently attached synthetic peptide 
corresponding to the 56 aniini) acid minimal activation domain of NTF-1, we 
5 showed that this region is siirncioni to interact with dTAFlIlSO and that the 
activator interlace of dTArillSO is distinct from the C-terminal region with 
interacts with dTRP and dTAFI 1250. Hence, the requirement for TAFs during 
NTF-I activation is at least in pan mediated by NTF-l:dTAFII150 interactions. 

10 TAF Sequence Data 

Nucleotide and amino aeiii sequences of: 
dTAFino«.(.snQ in N0:2I. 22) 
■ dTAFinO^.(SnO in N0:2.1, 24) 
dTAFI 140 (.SF.0 ID NO: 8. 9) 
15 dTAFF?60 (.SHQ H) NO: 6. 7) 

dTAFIISO (SnO ID N0:4. 5) ^ 
dTAFnH0(SnQ in N0:1, 2) 
. dTAFI.50 (SEO m NO: 19. 20 ) 
dTAF!I250 (.SnO in N0:3. 14) 

20 

hTAF!l.10f».(SnO in NO:28) 
hTAFII3nj3.(.SCO m NO:27) 
hTAFIMO rsno in N0:25. 26) 
hTAFn70(,SIIO in N0:12. 1.1) 
25 hTAFIM(M)(.Sn.Q ID N0:I7. 1«) 

hTAFIIl.V) (.SHQ in N0:I5. 16) 
hTAFn2.50(.Sr:0 in NO:10. II) 

hTAFI48 (.SIZQ ID NO:29. .^0) 
30 hTAFIl 10 (.snO II) NO:.M. .12) 

were obtained as described almve. Additional methods relating to Poll TAFs may 
be found in Comai et al. ( W2) CM 68. %5-976. 
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II is evident froin the above results that one can use the methods and 
compositions disclosed herein lor making and identifying diagnostic probes and 
therapeutic drugs. It will also be cjcar to one skilled in the art from a reading of 
this disclosure that advaniauc can be taken to effect alterations of gene expression: 
5 both genes encoding TAF and genes amenable to TAF-mediated transcriptional 
modulation. Such alicmtions can be effected for example, using a small molecule 
drug identified with disch)sed TAF-based screening assays. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each itidividual publication or patent application 

10 were specifically and individually indicated to be incorporated by reference. 
Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of the teachings of this 
invention that certain changes and modifications may be made thereto without 

15 departing from the spirit or scope of the appended claims. 
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(i) APPLICANT: Tjian, Robert 
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Hoey, Timothy 
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Weinzierl, Robert O.J. 

(ii) 
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(iii) 



CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Flehr, Hohbach, Test, Albritton 

(B) STREET: 4 Embarcadero Center, 34th Floor 

(C) CITY: San Francisco 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 94111 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: PatentIn Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/USS^i/ 

(B) FILING DATE: 28-JAN-199^ 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Osman, Richard A 

(B) REGISTRATION NUMBER: 36,627 

(C) REFERENCE/ DOCKET NUMBER: FP57650-2RA0 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415-494-8700 

(B) TELEFAX: 415-494-8771 



(2) INFORMATION FOR SEQ ID N0:1: ' 
(i) SEQUENCE CHARACTERISTICS: 



4 4 



TITLE OF INVENTION: TATA-Binding Protein Associated 
nucleic acids encoding^TAFs, and Methods of Use. 
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(iv) 

& 

Herbert 



wo 94/17087 



PCT/US94/01114 



(A) LENGTH: 4 615 base pairs 

(B) TYPE: nucleic acid 

■ (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 538.. 3300 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOtl: 

CAACTCGTCC GTACCTCGGC GGTCCGTAAA CAATATTTAC TCGGTTTTCG 
GCTAAATCGC 
60 

CAGAGAAACG CAACGGGAAA TCGTTTAAAA TGCGCCCCAG TGCACCGAGT 
TTGAACGCAA 
120 

AATGAATTGA ATGCTCAACA ATCAGTCCGT GCGAGCACGC GCGAGTGTGT 
GTGTGCGCAG 
180 

GAAAACCCGC CGATCGGGAA AAGTGTAGAA AGGCTTAGCG GCGCAAACAA 
AAGGCAGCGA 
240 

ATTAGCGAGA TAACACACAC GCGACAACGA CTGCAACGGA TGCGCCAGGA 
GAAAGGCCGA 
300 

CGACAGTGAC GGCAAAGGCG AGTGCGAGTG AGCCAGCGCA GCACCAATTC 
AGCGGAGCAC 
360 

CCGCTTTTTT GGCCAAGTTC GCTTCTGGAG CGCACAGCAT GCAACAACTC 
CGCCAACACC 
420 

AACACAGGAT GTGCGCAACT AGTTGATCGG AACAGGATCG CTCGCCCACA 
CCAACACACA 
480 

GAAGTCAGTG GAATAGGAGA AACACACTCG CCAATAACAT AAACACCACA 
CA6CACG 
537 

ATG AAC ACC AGC CAG ACA GCT GCC GGC AAT CGC ATC ACC TTC ACC 
AGC 

585 
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Met Asn Thr Ser Gin Thr Ala Ala Gly Asn Arg He Thr Phe Thr 
Ser 

1 5 . 10 

15 . 

CAG CCG CTG CCC AAT GGC ACC ATC AGC ATA GCC GGC AAT CCC GGC 
GCG 

633 

Gin Pro Leu Pro Asn Gly Thr He Ser He Ala Gly Asn Pro Gly 
Ala 

20 25 30 

GTC ATC TCC ACG GCC CAG CTA CCG AAT ACC ACC ACC ATC AAG ACG 
ATC 

681 

Val He Ser Thr Ala Gin Leu Pro Asn Thr Thr Thr He Lys Thr 

lie ' 
35 40 45 



CAG GCG GGG ATC GGT GGT CAG CAT CAG GGA CTT CAG CAG GTG CAT 
CAT 

729 . 
Gin Ala Gly He Gly Gly Gin His Gin Gly Leu Gin Gin Val Hxs 

His 

50 55 60 



GTC CAA CAG CAG CAG CAG TCG CAA CAG CAA CAA CAG CAG CAA CAG 
CAG 

777 

Val Gin Gin Gin Gin Gin Ser Gin Gin Gin Gin Gin Gin Gin Gin 

Gin \ = 

65 70 75 

80 

ACG CAA TCC GCC GGT CAA CCG CTG GTC AAT TCA ATG CTG CCG GCT 
GGC 

825 

Thr Gin Ser Ala Gly Gin Pro Leu Leu Asn Ser Met Leu Pro Ala 

85 90 

95 

GTG GTG GTG GGC ATG CGC CAA CAG GCG CCG TCA CAG CAG CAG CAG 
AAG 

873 

Val Val Val Gly Met Arg Gin Gin Ala Pro Ser Gin Gin Gin Gin 
Lys 

100 105 110 
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AAT GTG CCC ACC AAC CCG CTC AGT CGC GTG GTG ATC AAC TCC CAC 
ATG 

921 

Asn Val Pro Thr Asn Pro Leu Ser Arg Val Val lie Asn Ser His 
Met 

115 120 125 



GCG GGC GTG AGA CCG CAG AGT CCA TCG ATA ACT TTA AGC ACA CTT 
AAT 

969 

Ala Gly Val Arg Pro Gin Ser Pro Ser lie Thr Leu Ser Thr Leu 
Asn 

130 135 140 



ACG GGT CAG ACC CCG GCA TTG CTG GTC AAG ACG GAT AAC GGA TTC 
CAG 
1017 

Thr Gly Gin Thr Pro Ala Leu Leu Val Lys Thr Asp Asn Gly Phe 
Gin 

145 150 155 

160 

CTG TTG CGC GTG GGC ACG ACG ACG GGT CCG CCG ACG GTG ACA CAG 
ACT 
1065 

Leu Leu Arg Val Gly Thr Thr Thr Gly Pro Pro Thr Val Thr Gin 
Thr 

165 170 175 



ATA ACC AAC ACC AGC AAT AAC AGC AAC ACG ACA AGC ACC ACA AAC 
CAT 
1113 

lie Thr Asn Thr Ser Asn Asn Ser Asn Thr Thr Ser Thr Thr Asn 
His 

180 185 190 



CCC ACA ACC ACA CAG ATC CGT CTG CAA ACT GTG CCG GCT GCA GCT 
TCT 
1161 

Pro Thr Thr Thr Gin lie Arg Leu Gin Thr Val Pro Ala Ala Ala 
Ser 

195 200 205 



ATG ACC AAC ACG ACC GCC ACC AGC AAC ATC ATT GTC AAT TCG GTG 
GCA 
1209 

Met Thr Asn Thr Thr Ala Thr Ser Asn lie lie Val Asn Ser Val 
Ala 
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210 215 220 



AGC AGT GGA TAT GCA AAC TCT TCG CAG CCG CCG CAT CTG ACG CAA 
CTA 
1257 

Ser Ser Gly Tyr Ala Asn Ser Ser Gin Pro Pro His Leu Thr Gin 
Leu 

225 230 235 

240 

AAT GCG CAG GCG CCA CAA CTG CCG CAG ATT ACG CAG ATT CAA ACA 
ATA 
1305 

Asn Ala Gin Ala Pro Gin Leu Pro Gin lie Thr Gin lie Gin Thr 
lie 

245 250 255 



CCG GCC CAG CAG TCT CAG CAG CAG CAG GTG AAC AAT GTA AGC TCC 
GCG 
1353 

Pro Ala Gin Gin Ser Gin Gin Gin Gin Val Asn Asn Val Ser Ser 
Ala 

260 265 270 



GGA GGA ACG GCA ACG GCG GTC AGC AGT ACG ACG GCA GCG ACG ACG 
ACG 
1401 

Gly Gly Thr Ala Thr Ala Val Ser Ser Thr Thr Ala Ala Thr Thr 
Thr 

275 280 285 



CAG CAG GGC AAT ACC AAA GAA AAG TGT CGC AAG TTT CTA GCC AAT 
TTA 
1449 

Gin Gin Gly Asn Thr Lys Glu Lys Cys Arg Lys Phe Leu Ala Asn 
Leu 

290 295 300 



ATC GAA TTG TCG ACA CGG GAA CCG AAG CCG GTG GAG AAG AAC GTG 
CGC 
1497 

lie Glu Leu Ser Thr Arg Glu Pro Lys Pro Val Glu Lys Asn Val 
Arg 

305 310 315 

320 

ACC CTC ATC CAG GAG CTG GTC AAT GCG AAT GTC GAG CCG GAG GAG 
TTT 
1545 
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Thr Leu lie Gin Glu Leu Val Asn Ala Asn Val Glu Pro Glu Glu 
Phe 

325 330 335 



TGT GAG CGC CTG GAG CGC TTG CTC AAC GCC AGO CCG GAG CCG TGT 
TTG 
1593 

Cys Asp Arg Leu Glu Arg Leu Leii Asn Ala Ser Pro Gin Pro Cys 
Leu 

340 345 350 



ATT GGA TTC CTT AAG AAG AGT TTG CCT CTG CTA CGA CAA GCC CTC 
TAC 
1641 

lie Gly Phe Leu Lys Lys Ser Leu Pro Leu Leu Arg Gin Ala Leu 
Tyr 

355 360 365 



ACA AAG GAG CTG GTC ATC GAA GGC ATT AAA CCT CCG CCG CAG CAC 
GTT 
1689 

Thr Lys Glu Leu Val lie Glu Gly He Lys Pro Pro Pro Gin His 
Val 

370 375 380 



CTC GGC CTG GCC GGA CTC TCT CAA CAG TTG CCT AAA ATC CAA GCG 
CAA 
1737 

Leu Gly Leu Ala Gly Leu Ser Gin Gin Leu Pro Lys He Gin Ala 
Gin 

385 390 395 

400 

ATC CGT CCG ATC GGT CCT AGC CAG ACA ACG ACC ATT GGA CAG ACG 
CAG 
1785 

He Arg Pro He Gly Pro Ser Gin Thr Thr Thr He Gly Gin Thr 
Gin 

405 410 



GTG CGT ATG ATA ACG CCG AAT GCC TTG GGC ACG CCG CGA CCC ACC 
ATT 
1833 

Val Arg Met He Thr Pro Asn Ala Leu Gly Thr Pro Arg Pro Thr 
He 

420 425 430 
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GGC CAC ACC ACG ATA TCG AAG CAG CCA CCG AAT ATT CGG TTG CCT 
ACG 
1881 

Gly His Thr Thr lie Ser Lys Gin Pro Pro Asn lie Arg Leu Pro 
Thr 

435 440 445 



GGC CCG CGT CTC GTC AAC ACT GGA GGA ATT CGC ACC CAG ATA CCC 
TCG 
1929 

Ala Pro Arg Leu Val Asn Thr Gly Gly He Arg Thr Gin He Pro 
Ser 

450 455 460 

TTG CAG GTG CCT GGT CAG GCG AAC ATT GTG CAA ATA CGT GGA CCG 
CAG 
1977 

Leu Gin Val Pro Gly Gin Ala Asn He Val Gin He Arg Gly Pro 
Gin 

465 470 475 

480 

CAT GCT CAG CTG CAG CGT ACT GGA TCG GTC CAG ATC CGG GCC ACC 
ACT 
2025 

His Ala Gin Leu Gin Arg Thr Gly Ser Val Gin He Arg Ala Thr 
Thr 

485 490 495 



CGT CCG CCA AAC AGT GTG CCC ACC GCG AAC AAA CTC ACT GCC GTC 
AAG 
2073 

Arg Pro Pro Asn Ser Val Pro Thr Ala Asn Lys Leu Thr Ala Val 
Lys 

500 505 510 



GTG GGA CAG ACG CAA ATC AAA GCG ATT ACG CCC AGC CTG CAT CCA 
CCC 
2121 

Val Gly Gin Thr Gin He Lys Ala He Thr Pro Ser Leu His Pro 
Pro 

515 520 525 



TCG CTG GCG GCA ATC TCA GGT GGA CCA CCG CCG ACA CCC ACG CTG 
TCT 
2169 

Ser Leu Ala Ala He Ser Gly Gly Pro Pro Pro Thr Pro Thr Leu 
Ser 
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530 535 540 



GTT TTG TCT ACG TTG AAC TCC GCC TCG ACC ACA ACG CTG CCC ATA 
CCA 
2217 

Val Leu Ser Thr Leu Asn Ser Ala Ser Thr Thr Thr Leu Pro lie 
Pro 

545 550 555 

560 

TCG TTA CCC ACG GTC CAC CTT CCC CCC GAA GCT CTT CGA GCC CGT 
GAG 
2265 

Ser Leu Pro Thr Val His Leu Pro Pro Glu Ala Leu Arg Ala Arg 
Glu 

565 570 575 



CAG ATG CAA AAT TCG CTG AAC CAC AAC AGC AAT CAC TTC GAT GCA 
AAA 
2313 

Gin Met Gin Asn Ser Leu Asn His Asn Ser Asn His Phe Asp Ala 
Lys 

580 585 590 



CTG GTG GAG ATC AAG GCG CCG TCG CTG CAT CCG CCG CAC ATG GAG 
CGG 
2361 

Leu Val Glu lie Lys Ala Pro Ser Leu His Pro Pro His Met Glu 
Arg 

595 600 605 



ATC AAC GCA TCT CTC ACA CCG ATT GGA GCC AAG ACG ATG GCA AGG 
CCG 
2409 

lie Asn Ala Ser Leu Thr Pro lie Gly Ala Lys Thr Met Ala Arg 
Pro 

610 615 620 



CCG CCT GCG ATC AAC AAG GCG ATA GGG AAA AAG AAA CGC GAC GCC 
ATG 
2457 

Pro Pro Ala lie Asn Lys Ala lie Gly Lys Lys Lys Arg Asp Ala 
Met 

625 630 635 

640 

GAA ATG GAC GCC AAA TTG AAC ACA TCG AGC GGA GGA GCG GCG TCC 
GCT 
2505 
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Glu Met Asp Ala Lys Leu Asn Thr Ser Ser Gly Gly Ala Ala Ser 
Ala 

645 650 655 



GCG AAC TCG TTT TTC CAG CAG AGC TCC ATG TCC TCG ATG TAG GGT 
GAC 
2553 

Ala Asn Ser Phe Phe Gin Gin Ser Ser Met Ser Ser Met Tyr Gly 
Asp 

660 665 670 



GAT GAT ATC AAC GAT GTT GCC GCC ATG GGA GGT GTT AAC TTG GCG 
GAG 
2601 

Asp Asp lie Asn Asp Val Ala Ala Met Gly Gly Val Asn Leu Ala 
Glu 

675 680 685 



GAG TCG CAG CGA ATT CTC GGC TGT ACC GAA AAC ATC GGC ACG CAG 
ATT 
2649 

Glu Ser Gin Arg lie Leu Gly Cys Thr Glu Asn lie Gly Thr Gin 
He 

690 695 700 



CGA TCC TGC AAA GAT GAG GTT TTT CTT AAT CTC CCC TCG CTG CAA 
GCT 
2697 

Arg Ser Cys Lys Asp Glu Val Phe Leu Asn Leu Pro Ser Leu Gin 
Ala 

705 710 715 

720 

AGA ATA CGG GCA ATT ACT TCG GAG GCG GGA CTG GAT GAG CCG TCG 
CAG 
2745 

Arg He Arg Ala He Thr Ser Glu Ala Gly Leu Asp Glu Pro Ser 
Gin 

725 730 735 



GAT GTG GCC GTT CTG ATA TCG CAC GCC TGT CAG GAG CGC CTG AAG 
AAC 
2793 

Asp Val Ala Val Leu He Ser His Ala Cys Gin Glu Arg Leu Lys 
Asn 

740 745 750 
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ATC GTT GAG AAG TTG GCT GTG ATA GCG GAG CAC CGC ATT GAT GTC 
ATC 
2841 

He Val Glu Lys Leu Ala Val He Ala Glu His Arg He Asp Val 
He 

755 760 765 



AAG TTG GAT CCA CGC TAT GAG CCC GCC AAG GAT GTG CGC GGT CAG 
ATC 
2889 

Lys Leu Asp Pro Arg Tyr Glu Pro Ala Lys Asp Val Arg Gly Gin 
He 

770 775 780 



AAG TTT CTC GAG GAG CTG GAC AAG GCC GAG CAG AAG CGA CAC GAG 
GAA 
2937 

Lys Phe Leu Glu Glu Leu Asp Lys Ala Glu Gin Lys Arg His Glu 
Glu 

785 790 795 

BOO 

CTG GAG CGT GAG ATG CTG CTG CQG GCA GCC AAG TCA CGG TCG AGG 
GTG 
2985 

Leu Glu Arg Glu Met Leu Leu Arg Ala Ala Lys Ser Arg Ser Arg 
Val 

805 810 815 



GAA GAT CCC GAG CAG GCC AAG ATG AAG GCG AGG GCC AAG GAG ATG 
CAA 
3033 

Glu Asp Pro Glu Gin Ala Lys Met Lys Ala Arg Ala Lys Glu Met 
Gin 

820 825 830 

CGC GCC GAA ATG GAG GAG TTG CGT CAA CGA GAT GCC AAT CTG ACG 
GCG 
3081 

Arg Ala Glu Met Glu Glu Leu Arg Gin Arg Asp Ala Asn Leu Thr 
Ala 

835 840 845 



CTG CAG GCG ATT GGA CCT CGG AAA AAG CTG AAG CTG GAC GGC GAA 
ACA 
3129 

Leu Gin Ala He Gly Pro Arg Lys Lys Leu Lys Leu Asp Gly Glu 
Thr 
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850 855 860 



GTC AGT TCG GGA GCG GGT TCA AGT GGC GGC GGA GTG CTA AGC AGC 
TCG 
3177 

Val Sex Ser Gly Ala Gly Ser Ser Gly Gly Gly Val Leu Ser Ser 
Ser 

865 870 875 

880 

GGA TCT GCG CCG ACG ACG TTA CGG CCT CGC ATA AAA CGT GTG AAC 
CTG 
3225 

Gly Ser Ala Pro Thr Thr Leu Arg Pro Arg lie Lys Arg Val Asn 
Leu 

885 890 895 



CGC GAC ATG CTC TTC TAC ATG GAG CAA GAG CGG GAG TTC TGT CGC 
AGT 
3273 

Arg Asp Met Leu Phe Tyr Met Glu Gin Glii Arg Glu Phe Cys Arg 
Ser 

900 905 910 



TCC ATG CTG TTC AAG ACA TAC CTC AAG TGATCGCTGC TGTTGCCCAT 
3320 

Ser Met Leu Phe Lys Thr Tyr Leu Lys 
915 920 



CAATCGCACC GTCTTCTCCT CGCCGATCCT CCTACTCCGT GGACTGTCGT 
GTTGTTGTTT 
3380 

TATACAGCTT TACGATTTCA TCCACTTGCA ATATATTTTA GCCTCAACTT 
TAAATGCGTC 
3440 

GCGTGTCCCC TGTTGTTGTT TCTTTTTAGT TAGGCGGCTC TATTTAATTT 
CTATTTTTAC 
3500 

ATTTATTTAC ATAAATCCTA AATTCTAATC GTATTTGATT TTAAGCCTAA 
TTTAAAGCTC 
3560 

GTTTATTTTT CCAATAAATT CTCTGTAAAA CTTAAACCAA ACCAATCCAA 
AAACAAAACA 
3620 
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AAACCAGAGT AAACGAAGAG AATAAAATAA TAGAGAGGAA AGTAAAAGAA 
GGTAAAAGAG 
3680 

AGCGCGCAGT CAGCGGTCGT TTGATTTGTA ATTTGTAACA TAATAATGTT 
TGCATCAACT 
3740 

GCATTGACGG CCTTATCTAA ACGATATAAA CATAATTATT AATATTTAAT 
TATTTAGCTT 
3800 

AGTTTGTTAA ACGAAAACGA ACCATAATTC CTAGATTTTA AGTAAAAAGC 
AAGGGCGCGT 
3860 

GAAGAGAAAT CGAAACCGAA TTACAGATAA AGGTTTTTAA AACCAACTAG 
ATCGAAACAA 
3920 

GTTCAGCAAC AGCAAAACAA AAGAACACAT CAAAAAAAGA ACCGAAAAAT. 
ATCCATTTAA 
3980 

ACATCCATTG AATTAGGTTT AGTTGTTTAA AAAAGATGTA ATTTTTAATT 
ACCCATAATG 
4040 

TATAAACGGA AATCAATCGT TAGGCAAGAC CACAACAAAC CCAACAAATT 
GTAAATACAT 
4100 

TCTAGGCTAC GGTTTTTCTA ATAGATAACT AGGTAAAAAC GCAAACGTAA 
TTAACAAATT 
4160 

ATCGATGGCA AGGAGCGATG CGAGCGCAGA CAACTTGGCA CACCGAAAAA 
ATATGTTTTT 
4220 

ATTAGTGGCG CTCGTTCATC CATTAAGAAT GGCGATTCAT TAGGCTCCAT 
AGATCCATAA 
4280 

ATCCCCTAAT CCAATCTGAA CTACACACAA AATAGACAAA TTTTATACAA 
TTAGCTCGAT 
4340 

AAATCTTGTA AAATAGAGTC CCGTAAAAAA TTATAACAAA TAAATTGACA 
ACAATTGATG 
4400 

TAATTCAGTA AACCTAAGCA AAAAGTGAAA CCATTCTAAG CAAATTCTTT 
GTGTGTAAAA 



55 



BNSOOaO: <WO_9417087A1J^ 



wo 94/17087 



PCT/US94/01114 



4460 

ATTAATATGA TAAACAAAAT GCAGATGCAA CCGTAAACAG CGCATAGTTT 
GGTAGGCATA 
4520 

TAACTGAATA TATATATATT ATTATTATTA TGTTTTAACA TTAAGCAAAA 
AAATAAAAGA 
4580 

AAAAATTGAG AAAACTTCAA AAAAAAAAAA AAAAA 
4615 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 921 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asn Thr Ser Gin Thr Ala Ala Gly Asn Arg lie Thr Phe Thr 
Ser 

1 5 10 

15 

Gin Pro Leu Pro Asn Gly Thr lie Ser lie Ala Gly Asn Pro Gly 
Ala 

20 25 30 



Val lie Ser Thr Ala Gin Leu Pro Asn Thr Thr Thr lie Lys Thr 
lie 

35 40 45 



Gin Ala Gly lie Gly Gly Gin His Gin Gly Leu Gin Gin Val His 
His 

50 55 60 



Val Gin Gin Gin Gin Gin Ser Gin Gin Gin Gin Gin Gin Gin Gin 
Gin 

65 70 75 

80 

Thr Gin Ser Ala Gly Gin Pro Leu Leu Asn Ser Met Leu Pro Ala 
Gly 
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85 



90 



95 



Val Val Val Gly Met Arg Gin Gin Ala Pro Ser Gin Gin Gin Gin 
Lys 

100 105 110 



Asn Val Pro Thr Asn Pro Leu Ser Arg Val Val lie Asn Ser His 
Met 

115 120 125 



Ala Gly Val Arg Pro Gin Ser Pro Ser lie Thr Leu Ser Thr Leu 
Asn 

130 135 140 



Thr Gly Gin Thr Pro Ala Leu Leu Val Lys Thr Asp Asn Gly Phe 
Gin 

145 150 155 

160 

Leu Leu Arg Val Gly Thr Thr Thr Gly Pro Pro Thr Val Thr Gin 
Thr 

165 170 175 



lie Thr Asn Thr Ser Asn Asn Ser Asn Thr Thr Ser Thr Thr Asn 
His 

180 185 190 



Pro Thr Thr Thr Gin lie Arg Leu Gin Thr Val Pro Ala Ala Ala 
Ser 

195 200 205 



Met Thr Asn Thr Thr Ala Thr Ser Asn He He Val Asn Ser Val 
Ala 

210 215 220 



Ser Ser Gly Tyr Ala Asn Ser Ser Gin Pro Pro His Leu Thr Gin 
Leu 

225 230 235 

240 



Asn Ala Gin Ala Pro Gin Leu Pro Gin He Thr Gin He Gin Thr 
He 

245 250 255 
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Pro Ala Gin Gin Ser Gin Gin Gin Gin Val Asn Asn Val Ser Ser 
Ala 

260 265 270 



Gly Gly Thr Ala Thr Ala Val Ser Ser Thr Thr Ala Ala Thr Thr 
Thr 

275 280 285 



Gin Gin Gly Asn Thr Lys Glu Lys Cys Arg Lys Phe Leu Ala Asn 
Leu 

290 295 300 



lie Glu Leu Ser Thr Arg Glu Pro Lys Pro Val Glu Lys Asn Val 
Arg 

305 310 315 

320 

Thr Leu lie Gin Glu Leu Val Asn Ala Asn Val Glu Pro Glu Glu 
Phe 

325 330 335 



Cys Asp Arg Leu Glu Arg Leu Leu Asn Ala Ser Pro Gin Pro Cys 
Leu 

340 345 350 



lie Gly Phe Leu Lys Lys Ser Leu Pro Leu Leu Arg Gin Ala Leu 
Tyr 

355 360 365 



Thr Lys Glu Leu Val lie Glu Gly lie Lys Pro Pro Pro Gin His 
Val 

370 375 380 



Leu Gly Leu Ala Gly Leu Ser Gin Gin Leu Pro Lys lie Gin Ala 
Gin 

385 390 395 

400 

He Arg Pro He Gly Pro Ser Gin Thr Thr Thr He Gly Gin Thr 
Gin 

405 410 415 



Val Arg Met He Thr Pro Asn Ala Leu Gly Thr Pro Arg Pro Thr 
He 

420 425 430 
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Gly His Thr Thr lie Ser Lys Gin Pro Pro Asn lie Arg Leu Pro 
Thr 

435 440 445 



Ala Pro Arg Leu Val Asn Thr Gly Gly He Arg Thr Gin He Pro 
Ser 

450 455 460 



Leu Gin Val Pro Gly Gin Ala Asn He Val Gin He Arg Gly Pro 
Gin 

465 470 475 

480 

His Ala Gin Leu Gin Arg Thr Gly Ser Val Gin He Arg Ala Thr 
Thr 

485 490 495 



Arg Pro Pro Asn Ser Val Pro Thr Ala Asn Lys Leu Thr Ala Val 
Lys 

500 505 510 



Val Gly Gin Thr Gin He Lys Ala He Thr Pro Ser Leu His Pro 
Pro 

515 520 525 



Ser Leu Ala Ala He Ser Gly Gly Pro Pro Pro Thr Pro Thr Leu 
Ser 

530 535 540 



Val Leu Ser Thr Leu Asn Ser Ala Ser Thr Thr Thr Leu Pro He 
Pro 

545 550 555 

560 

Ser Leu Pro Thr Val His Leu Pro Pro Glu Ala Leu Arg Ala Arg 
Glu 

565 570 575 



Gin Met Gin Asn Ser Leu Asn His Asn Ser Asn His Phe Asp Ala 
Lys 

580 585 590 



Leu Val Glu He Lys Ala Pro Ser Leu His Pro Pro His Met Glu 
Arg 

595 600 605 
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He Asn Ala Ser Leu Thr Pro He Gly Ala Lys Thr Met Ala Arg 
Pro 

610 615 620 



Pro Pro Ala He Asn Lys Ala He Gly Lys Lys Lys Arg Asp Ala 
Met 

625 630 635 

640 

Glu Met Asp Ala Lys Leu Asn Thr Ser Ser Gly Gly Ala Ala Ser 
Ala 

645 650 655 



Ala Asn Ser Phe Phe Gin Gin Ser Ser Met Ser Ser Met Tyr Gly 
Asp 

660 665 670 



Asp Asp He Asn Asp Val Ala Ala Met Gly Gly Val Asn Leu Ala 
Glu 

675 680 685 



Glu Ser Gin Arg He Leu Gly Cys Thr Glu Asn He Gly Thr Gin 
He 

690 695 700 



Arg Ser Cys Lys Asp Glu Val Phe Leu Asn Leu Pro Ser Leu Gin 
Ala 

705 710 715 

720 ^ 

Arg He Arg Ala He Thr Ser Glu Ala Gly Leu Asp Glu Pro Ser 
Glh 

725 730 735 



Asp Val Ala Val Leu He Ser His Ala Cys Gin Glu Arg Leu Lys 
Asn 

740 745 750 



He Val Glu Lys Leu Ala Val He Ala Glu His Arg He Asp Val 
He 

755 760 765 



Lys Leu Asp Pro Arg Tyr Glu Pro Ala Lys Asp Val Arg Gly Gin 
He 

770 775 780 
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Lys Phe Leu Glu Glu Leu Asp Lys Ala Glu Gin Lys Arg His Glu 
Glu 

785 790 795 

800 

Leu Glu Arg Glu Met Leu Leu Arg Ala Ala Lys Ser Arg Ser Arg 
Val 

805 810 815 



Glu Asp Pro Glu Gin Ala Lys Met Lys Ala Arg Ala Lys Glu Met 
Gin 

820 825 830 



Arg Ala Glu Met Glu Glu Leu Arg Gin Arg Asp Ala Asn Leu Thr 
Ala 

835 840 845 



Leu Gin Ala lie Gly Pro Arg Lys Lys Leu Lys Leu Asp Gly Glu 
Thr 

850 855 860 



Val Ser Ser Gly Ala Gly Ser Ser Gly Gly Gly Val Leu Ser Ser 
Ser 

865 870 875 

880 

Gly Ser Ala Pro Thr Thr Leu Arg Pro Arg lie Lys Arg Val Asn 
Leu 

885 890 895 



Arg Asp Met Leu Phe Tyr Met Glu Gin Glu Arg Glu Phe Cys Arg 
Ser 

900 905 910 



Ser Met Leu Phe Lys Thr Tyr Leu Lys 
915 920 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TTCGCTGTAC GAGGTACCCG GTCCGAATTC CAAAAGGGCC AACAACTTCA 
CCCGTGACTT 
60 

TCTGCAGGTG TTTATTTACC GCCTGTTCTG GAAAAGTCGC GACAACCCGC 
CCCGCATTCG 
120 

AATGGACGAT ATAAAACAGG CTTTTCCCGC TCATTCCGAG AGCAGCATCC 
GCAAGCGTTT 
180 

AAAGCAGTGC GCTGACTTCA AGCGAACAGG CATGGACTCC AATTGGTGGG 
TTATAAAGCC 
240 

AGAGTTTCGC CTTCCATCCG AGGAGGAGAT CCGAGCCATG GTGTCACCTG 
AGCAGTGTTG 
300 

CGGTACTTCA GCATGATAGC GGCGGAACAA CGCTTAAAGG ATGCTGGGTA 
TGGAGAAAAG 
360 

TTTTTGTTCG CACCTCAGGA AGATGACGAC GAGGAGGCGC AGTGAAAGCT 
TGACGACGAA 
420 

GTAAAGGTGG CTCCTTGGAA CACGACTCGC GCATATATCC AAGCCATGCG 
GGGAAAGTGT 
480 

TTACTCCAGT TGAGTGGTCC AGCCGATCCA ACGGGATGTG GAGAGGGATT 
TTCATATGTT 
540 

CGAGTGCCAA ACAAGCCCAC GCAAACCAAG GAGGAGCAAG AGTCGCAGCC 
TAAACGTTCG 
600 

GTCACAGGAA CAGATGCAGA TTTGCGTCGT CTGCCACTCC AGCGTGCAAA 
AGAGCTGTTG 
660 

CGGCAGTTCA AGGTGCCCGA GGAGGAGATC AAAAAGCTTT CCCGCTGGGA 
GGTCATTGAC 
720 

GTGGTGCGCA CCCTGTCCAC AGAAAAGGCC AAGGCCGGTG AAGAGGGAAT 
GGATAAGTTT 
780 
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TCTCGTGGCA ACCGGTTCTC CATTGCAGAG CATCAGGAGC GTTATAAGGA 
AGAGTGCCAG 
840 

CGCATATTCG ATCTGCAAAA CAGAGTGCTG GCCAGCTCTG AGGTGCTGTC 
CACAGATGAG 
900 

GCAGAGTCCT CGGCCTCTGA GGAATCTGAT CTCGAAGAAC TTGGCAAGAA 
TCTTGAGAAC 
960 

ATGCTGTCAA ACAAGAAAAC CTCGACGCAA TTGTCAAGGG AACGTGAAGA. 
GCTGGAGCGT 
1020 

CAGGAGTTGC TTCGCCAGCT TGACGAAGAA CACGGCGGAC CAAGTGGTAG 
TGGAGGAGCC 
1080 

AAGGGAGCCA AAGGAAAGGA TGATCCGGGA GAGCAAATGC TGGCAACCAA 
CAACCAGGGC 
1140 

AGGATCCTTC GCATTACGCG TACCTTTAGA GGTAACGATG GCAAGGAATA 
TACTCGCGTG 
1200 

GAGACTGTGC GGCGGCAACC AGTTATCGAC GCCTACATCA AGATTCGCAC 
CACTAAGGAC 
1260 

GAGCAGTTCA TCAAGCAGTT CGCAACGCTA GATGAGCAGC AGAAGGAGGA 
GATGAAGCGC 
1320 

GAAAAGAGAC GCATTCAGGA GCAGCTACGT CGCATCAAGC GCAACCAGGA 
GCGCGAACGC 
1380 

CTGGCGCAGC TGGCCCAGAA CCAGAAGCTT CAGCCAGGTG GCATGCCCAC 
TTCCTTGGGT 
1440 

GATCCTAAGA GCTCGGGCGG TCATTCGCAC AAGGAGCGGG ATAGCGGCTA 
CAAGGAGGTC 
1500 

AGCCCTTCGC GCAAGAAGTT CAAGCTTAAG CCAGACCTAA AGCTGAAGTG 
CGGCGCCTGT 
1560 

GGACAGGTTG GTCACATGCG CACAAACAAA GCCTGTCCCT TGTATTCTGG 
CATGCAAAGC 
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1620 

AGTCTGTCCC AGTCGAACCC ATCTCTGGCT GACGATTTTG ACGAACAGAG 
CGAAAAGGAG 
1680 

ATGACAATGG ATGACGATGA TCTTGTGAAT GTCGATGGCA CCAAAGTAAC 
GCTCAGCAGT 
1740 

AAGATTCTCA AGCGTCATGG TGGTGATGAT GGCAAGCGTC GCAGCGGATC 
TAGCTCTGGT 
1800 

TTCACCTTGA AGGTTCCCCG AGATGCGATG GGCAAGAAGA AACGCAGAGT 
GGGTGGCGAT 
1860 

CTTCATTGTG ACTATCTGCA GCGACACAAT AAAACGGCCA ATCGCAGGCG 
CACGGACCCC 
1920 

GTTGTGGTAC TGTCCTCTAT CCTGGAGATT ATCCATAATG AGCTGCGATC 
TATGCCAGAT 
1980 

GTATCGCCAT TCCTGTTCCC GGTAAGCGCA AAAAAGGTTC CCGACTACTA 
CCGCGTGGTG 
2040 

ACCAAGCCCA TGGATCTGCA AACGATGAGG GAGTATATCG CCAAAGGCTA 
ACACGAGTCG 
2100 

CGAGATGTTC CTCGAGGATC TCAAGCAGAT TGTGGACAAC TCGCTGATCT 
ACAATGGACC 
2160 

GCAGAGTGCA TACACCTTGG CTGCCCAACG CATGTTCAGC AGTTGTTTTG 
AATTGCTCGC 
2220 

AGAGGCGAAG ACAAACTGAT GCGCCTCGAG AAGGCAATTA ACCCGCTGCT 
GGACGACGAT 
2280 

GACCAAGTGG CACTCTCCTT TATCTTTGAC AAGCTGCACT CGCAGATTAA 
GCAATTACCA 
2340 

GAGAGCTGGC CTTTCCTTAA GCCTGTCAAC AAGAAACAGG TTAAGGACTA 
CTACACGGTT 
2400 
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ATCAAGCGAC CCATGGACCT CGAAACTATC GGCAAAAACA TTGAAGCTCA 
TCGCTATCAC 
2460 

AGTCGTGCCG AGTATCTGGC TGATATCGAG TTGATCGCCA CCAACTGTGA 
GCAGTACAAC 
2520 

GGCAGTGACA CCCGCTACAC C/^GTTCTCA AAGAAGATAC TTGAGTATGC 
CCAAACCCAG 
2580 

TTAATTGAGT TTTCGGAGCA CTGCGGCCAG TTGGAAAATA ACATAGCTAA 
GACGCAGGAG 
2640 

CGTGCTAGGG AAAATGCACC AGAGTTTGAT GAAGCCTGGG GCAATGATGA 
TTACAACTTT 
2700 

GACCGTGGCA GTAGGGCCAG TTCACCCGGA GATGACTACA TCGACGTCGA 
GGGTCATGGG 
2760 

GGGCATGCCT CCTCATCGAA CTCTATCCAT CGCAGCATGG GCGCCGAGGC 
CGGTTCGTCA 
2820 

CATACGGCGC CGGCGGTGCG AAAACCAGCt CCTCCTGGTC CTGGTGAGGT 
GAAGCGCGGA 
2880 

AGGGGTAGGC CCCGCAAGCA GCGCGACCCC GTGGAGGAGG TCAAATCCCA 
GAATCCGGTT 
2940 

AAGCGTGGTC GGGGGCGTCC GAGGAAGGAC AGCCTTGCCT CAAACATGAG 
TCACACGCAA 
3000 

GCTTACTTCC TGGATGAAGA TCTCCAATGC TCCACAGATG ACGAGGACGA 
CGACGAGGAG 
3060 

GAGGACTTCC AGGAGGTCTC CGAAGACGAG AACAATGCGG CGAGCATTTT 
AGATCAGGGC 
3120 

GAACGTATCA ATGCGCCTGC CGATGCCATG GATGGCATGT TTGACCCCAA 
GAACATCAAG 
3180 

ACAGAGATTG ACCTAGAGGC TCACCAGATG GCAGAGGAGC CGATCGGCGA 
GGATGACAGC 
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3240 

CAGCAGGTGG CCGAAGCAAT GGTGCAGTTG AGTGGCGTGG GCGGCTACTA 
TGCTCAACAG 
3300 

CAGCAAGATG AATCCATGGA TGTGGACCCC AACTACGATC CCTCAGATTT 
CCTCGCCATG 
3360 

CACAAGCAGC GCCAGAGCCT CGGCGAGCCC AGCAGCTTGC AGGGTGCTTT 
CACCAACTTC 
3420 

CTATCGCACG AGCAGGATGA TAATGGGCCT TACAATCCCG CCGAAGCCAG 
CACAAGTGCC 
3480 

GCTTCCGGTG CAGACTTAGG AATGGACGCT TCAATGGCCA TGCAAATGGC 
GCCGGAAATG 
3540 

CCTGTCAATA CCATGAACAA CGGAATGGGC ATCGATGATG ATCTGGATAT 
TTCGGAGAGT 
3600 

GACGAGGAAG ACGATGGTTC TCGAGTGCGT ATCAAAAAGG AGGTCTTCGA 
CGACGGGGAT 
3660 

TACGCCTTGC AGCACCAGCA GATGGGACAG GCAGCATCGC AGTCGCAGAT 
ATACATGGGG 
3720 



ATTCGTCCAA CGAGCCCACG ACTCTCGACT ACCAGCAACC ACCGCAACTG 
GACTTCCAAC 
3780 

AAGTGCAGGA AATGGAGCAG TTGCAGCACC AAGTGATGCC ACCAATGCAA 
TCAGAGCAAC 
3840 

1^ 

tJ^ TGCA GCAGCA ACAGACGCCG CAGGAGACAA TGATTATGCC TGGACTTTTT 
' fAGT^ATAGGG 

AATAATTGTT AGTTGTTAGA AAATAAAACG TCGATTTAAT AATAGGATTG 
AGCTTCGCTG 
3960 

TGAAACAATT TTATACACTT TTTACAATGC ATTGTTTTAA CGGATTTTGA 
AATACTACAA 
4020 
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TATGTTCTCT GAAAAAATAT TTCCTTTTCA TGCCAATATG TTTTTAATTT 
TACACTTTAC 
4080 

AATTTATGAA ATCTAATTCA AAATATGTTT TTAAAATATA ATTTTCATAA 
CTTTAAATAA 
4140 

TGCCTAGAAA AAAAAAAAAA AAAA 
4164 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2359 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 49.. 2160 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GATAACAAAA TAGTACACAA GTTCCATATA TTTCAATTTT CCGCGAAA ATG 
AGC CTG 
57 

Met 

Ser Leu 

1 



GAA GTG AGC AAT ATC AAC GGG GGA AAC GGT ACT CAA TTG TCC CAC 
GAC 

105 

Glu Val Ser Asn lie Asn Gly Gly Asn Gly Thr Gin Leu Ser His 
Asp 

5 10 15 



AAG CGT GAG CTG CTA TGC CTG CTG AAA CTC ATC AAA AAG TAC CAG 
CTG 

153 

Lys Arg Glu Leu Leu Cys Leu Leu Lys Leu He Lys Lys Tyr Gin 
Leu 

20 25 30 

35 
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AAG AGC ACT GAG GAG CTG CTC TGC CAA GAG GCG AAT GTG AGC AGT 
GTG 

201 

Lys Ser Thr Glu Glu L^u Leu Cys Gin Glu Ala Asn Val Ser Ser 
Val 

40 45 

50 

GAA TTG TCG GAA ATC AGC GAA AGT GAT GTT GAG CAG GTG CTG GGC 
GCA 

249 

Glu Leu Ser Glu lie Ser Glu Ser Asp Val Gin Gin Val Leu Gly 
Ala 

55 60 65 



GTT TTG GGA GCT GGC GAT GCC AAC CGG GAG CGG AAA CAT GTC CAA 
TCT 

297 

Val Leu Gly Ala Gly Asp Ala Asn Arg Glu Arg Lys His Val Gin 
Ser 

70 75 80 

GCG GCG CAG GGT CAT AAA CAG TCC GCG GTG ACG GAG GCC AAT GCT 
GCA 

345 

Pro Ala Gin Gly His Lys Gin Ser Ala Val Thr Glu Ala Ash Ala 
Ala 

85 90 95 



GAG GAA CTG GCC AAG TTC ATC GAC GAC GAC AGC TTT GAT GCT CAG 
CAC 

393 

Glu Glu Leu Ala Lys Phe lie Asp Asp Asp Ser Phe Asp Ala Gin 
His 

100 105 110 

115 

TAT GAG CAG GCA TAC AAG GAG CTG CGC ACT TTC GTT GAG GAC TCC 
CTG 

441 

Tyr Glu Gin Ala Tyr Lys Glu Leu Arg Thr Phe Val Glu Asp Ser 
Leu 

120 125 130 

/ 

GAC ATA TAC AAG CAT GAG CTG TCC ATG GTT CTG TAC CCA ATT CTG 
GTG 

489 

Asp lie Tyr Lys His Glu Leu Ser Met Val Leu Tyr Pro lie Leu 
Val 



68 



wo 94/17087 



PCT/US94/01114 



135 



140 



145 



CAG ATC TAG TTC AAG ATC CTC GCC AGT GGA CTA AGG GAG AAG GCC 
AAA 

537 

Gin He Tyr Phe Lys He Leu Ala Ser Gly Leu Arg Glu Lys Ala 
Lys 

150 155 160 



GAA TTC ATT GAG AAG TAG AAA TGC GAT CTC GAC GGC TAG TAG ATA 
GAG 

585 

Glu Phe He Glu Lys Tyr Lys Cys Asp Leu Asp Gly Tyr Tyr He 
Glu 

165 170 175 



GGT CTT TTC AAG GTT GTT TTG GTG TCT AAG GCC GAG GAG CTG GTG 
GAG 

633 

Gly Leu Phe Asn Leu Leu Leu Leu Ser Lys Pro Glu Glu Leu Leu 
Glu 

180 185 190 

195 

AAT GAC CTG GTA GTA GCC ATG GAG GAG GAT AAG TTT GTG ATT GGC 
ATG 

681 

Asn Asp Leu Val Val Ala Met Glu Gin Asp Lys Phe Val He Arg 
Met 

200 205 210 



TGC AGG GAG TCG GAG TCT GTG TTC AAG GGA GAG ATT GAG GAT GGC 
GGG 

729 

Ser Arg Asp Ser His Ser Leu Phe Lys Arg His He Gin Asp Arg 
Arg 

215 220 225 



CAG GAA GTG GTG GCA GAT ATT GTT TGC AAG TAG TTG CAT TTG GAG 
ACA . 
777 

Gin Glu Val Val Ala Asp He Val Ser Lys Tyr Leu His Phe Asp 
Thr 

230 235 240 



TAG GAG GGG ATG GGG GGC AAG AAG GTG GAG TGG GTG GCC ACC GGG 
GGG 

825 
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Tyr Glu Cly Met Ala Arg Asn Lys Leu Gin Cys Val Ala Thr Ala 
Gly 

245 250 255 

TCG CAC CTC GGA GAG GCC AAG CGA CAG GAC AAC AAA ATG CGG GTG 
TAG 

873 

Ser His Leu Gly Glu Ala Lys Arg Gin Asp Asn Lys Met Arg Val 
Tyr 

260 265 270 

275 

TAG GGA CTG CTC AAG GAG GTG GAC TTT CAG ACT CTG ACC ACT CCA 
GCG 

921 

Tyr Gly Leu Leu Lys Glu Val Asp Phe Gin Thr Leu Thr Thr Pro 
Ala 

280 285 290 



CCG GCA CCA GAG GAG GAG GAC GAT GAT CCG GAT GCC CCG GAT CGT 
CCG 

969 

Pro Ala Pro Glu Glu Glu Asp Asp Asp Pro Asp Ala Pro Asp Arg 
Pro 

295 300 305 



AAA AAG AAA AAG CCA AAA AAG GAT CCC CTG CTG TCG AAA AAG TCC 
AAG 
1017 

Lys Lys Lys Lys Pro Lys Lys Asp Pro Leu Leu Ser Lys Lys Ser 
Lys 

310 315 320 



TCG GAT CCG AAT GCT CCA TCC ATC GAC AGA ATT CCC CTG CCG GAA 
CTG 
1065 

Ser Asp Pro Asn Ala Pro Ser lie Asp Arg lie Pro Leu Pro Glu 
Leu 

325 330 335 



AAG GAT TCG GAC AAG TTG CTA AAG CTT AAG GCT CTC AGG GAA GCC 
AGC 
1113 

Lys Asp Ser Asp Lys Leu Leu Lys Leu Lys Ala Leu Arg Glu Ala 
Ser 

340 345 350 

355 



70 



wo 94/17087 



PCT/US94/01114 



AAG CGT TTA GCC CTC AGO AAG GAT CAA CTG CCC TCT GCC GTC TTC 
TAG 
1161 

Lys Arg Leu Ala Leu Se'r Lys Asp Gin Leu Pro Ser Ala Val Phe 
Tyr 

360 365 370 



ACG GTG CTT AAT TCC CAT CAG GGC GTA ACC TGT GCC GAG ATT TCA 
GAC 
1209 

Thr Val Leu Asn Ser His Gin Gly Val Thr Cys Ala Glu lie Ser 
Asp 

375 380 385 



GAT TCC ACG ATG TTG GCC TGT GGA TTT GGC GAT TCT AGC GTG AGG 
ATT 
1257 

Asp Ser Thr Met Leu Ala Cys Gly Phe Gly Asp Ser Ser Val Arg 
He 

390 395 400 



TGG TCA TTG ACG CCC GCG AAG CTG CGT ACG CTG AAG GAT GCA GAT 
TCC 
1305 

Trp Ser Leu Thr Pro Ala Lys Leu Arg Thr Leu Lys Asp Ala Asp 
Ser 

405 410 415 



CTT CGC GAA CTG GAC AAG GAA TCG GCG GAT ATC AAT GTG CGT ATG 
CTG 
1353 

Leu Arg Glu Leu Asp Lys Glu Ser Ala Asp He Asn Val Arg Met 
Leu 

420 425 430 

435 

GAT GAC CGA AGT GGT GAG GTA ACC AGG AGC TTA ATG 6GT CAC ACC 
GGA 
1401 

Asp Asp Arg Ser Gly Glu Val Thr Arg Ser Leu Met Gly His Thr 
Gly 

440 445 450 



CCC GTA TAC CGC TGT GCC TTT GCC CCC GAG ATG AAC CTG TTG CTC 
TCA 
1449 

Pro Val Tyr Arg Cys Ala Phe Ala Pro Glu Met Asn Leu Leu Leu 
Ser 
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455 460 465 

TGT TCC GAG GAG AGO ACC ATA AGG CTG TGG TCT CTG CTC ACC TGG 
TCC 
1497 

Cys Ser Glu Asp Ser Thr lie Arg Leu Trp Ser Leu Leu Thr Trp 
Ser 

470 475 480 



TGC GTA GTC ACC TAC CGC GGG CAC GTT TAG CCG GTG TGG GAT GTT 
CGC 
1545 

Cys Val Val Thr Tyr Arg Gly His Val Tyr Pro Val Trp Asp Val 
Arg 

485 490 495 

TTT GCG CCG CAT GGC TAC TAT TTT GTT TCT TGT TCG TAC GAC AAA 
ACT 
1593 

Phe Ala Pro His Gly Tyr Tyr Phe Val Ser Cys Ser Tyr Asp Lys 
Thr 

500 505 510 

515 

GCT CGT CTG TGG GCC ACG GAT TCC AAT CAA GCG TTG CGC GTA TTC 
GTG 
1641 

Ala Arg Leu Trp Ala Thr Asp Ser Asn Gin Ala Leu Arg Val Phe 
Val 

520 525 530 

GGT CAC TTG TCG GAC GTG GAT TGT GTA CAA TTT CAT CCC AAT TCC 
AAT 
1689 

Gly His Leu Ser Asp Val Asp Cys Val Gin Phe His Pro Asn Ser 
Asn 

535 540 545 



TAT GTG GCC ACC GGT TCT AGC GAT CGC ACG GTA CGC CTG TGG GAC 
AAC 
1737 

Tyr Val Ala Thr Gly Ser Ser Asp Arg Thr Val Arg Leu Trp Asp 
Asn 

550 555 560 



ATG ACC GGT CAG TCG GTA CGC CTG ATG ACG GGC CAC AAG GGA TCG 
GTG 
1785 
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Met Thr Gly Gin Ser Val Arg Leu Met Thr Gly His Lys Gly Ser 
565 570 575 



AGT TCT CTG GCC TTC TCC GCC TGC GGC CGG TAT CTG GCC TCG GCT 
TCA 
1833 

Ser Ser Leu Ala Phe Ser Ala Cys Gly Arg Tyr Leu Ala Ser Glv 
Ser ' 
580 585 590 

595 

GTA GAT CAC AAT ATC ATC ATC TGG GAT CTG TCG AAC GGA TCC CTG 
GTC 
1881 

Val Asp His Asn He He He Trp Asp Leu Ser Asn Gly Ser Leu 



600 605 



610 



ACC ACC CTG TTG AGG CAC ACT AGC ACT GTG ACC ACG ATC ACC TTT 
AGT 
1929 

Thr Thr Leu Leu Arg His Thr Ser Thr Val Thr Thr He Thr Phe 
Ser 

615 620 625 

CGC GAT GGA ACA GTC CTG GCT GCA GCC GGC TTG GAT AAC AAT CTA 
ACT 
1977 

Arg Asp Gly Thr Val Leu Ala Ala Ala Gly Leu Asp Asn Asn Leu 
Thr 

63«> 635 640 

CTG TGG GAC TTT CAC AAG GTT ACC GAA GAC TAT ATC AGC AAT CAP 
ATC 
2025 

Leu Trp Asp Phe His Lys Val Thr Glu Asp Tyr He Ser Asn His 
He 

645 650 655 



ACT GTG TCG CAC CAT CAG GAT GAG AAC GAC GAG GAC GTC TAC CTC 
ATG 
2073 

Thr Val Ser His His Gin Asp Glu Asn Asp Glu Asp Val Tyr Leu 
Met ■ 

660 665 670 

675 
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CGT ACT TTC CCC AGC AAG AAC TCG CCA TTT GTC AGC CTG CAC TTT 
ACG 
2121 

Arg Thr Phe Pro Ser Lys Asn Ser Pro Phe Val Ser Leu His Phe 
Thr 

680 685 690 



CGC CGA AAT GTC CTG ATG TGC GTG GGT CTA TTC AAG AGT 
TAGGAGCACA 
2170 

Arg Arg Asn Leu Leu Met Cys Val Gly Leu Phe Lys Ser 
695 700 



GATAAGCTTA TTTGGTATAC GTAATGTAGT GTTAAGGAAT GCTCGGAATG 
TTTAGGATTA 
2230 

ATGTTTTGTA TTTCGTTTGT GACCCATCCC CCCTGAAATG TCGATTAGTT 
GTTTAAGCAT 
2290 

AAAAGTGTAA AGTGCATATA TGCGCAAGTT ATCAATAAAT TTTAATTAAT 
ATAAAAGTCA 
2350 

AAAAAAAAA 

2359 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 704 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Ser Leu Glu Val Ser Asn lie Asn Gly Gly Asn Gly Thr Gin 
Leu 

1 5 10 

15 

Ser His Asp Lys Arg Glu Leu Leu Cys Leu Leu Lys Leu lie Lys 
Lys 

20 25 30 
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Tyr Gin Leu Lys Ser Thr Glu Glu Leu Leu Cys Gin Glu Ala Asn 
Val 

35 40 45 



Ser Ser Val Glu Leu Ser Glu lie Ser Glu Ser Asp Val Gin Gin 
Val 

50 55 60 



Leu Gly Ala Val Leu Gly Ala Gly Asp Ala Asn Arg Glu Arg Lys 
His 

65 70 75 

80 

Val Gin Ser Pro Ala Gin Gly His Lys Gin Ser Ala Val Thr Glu 
Ala 

85 90 

95 

Asn Ala Ala Glu Glu Leu Ala Lys Phe He Asp Asp Asp Ser Phe 
Asp 

100 105 110 



Ala Gin His Tyr Glu Gin Ala Tyr Lys Glu Leu Arg Thr Phe Val 
Glu 

115 120 . 125 



Asp Ser Leu Asp He Tyr Lys His Glu Leu Ser Met Val Leu Tyr 
Pro 

130 135 140 



He Leu Val Gin He Tyr Phe Lys He Leu Ala Ser Gly Leu Arci 
Glu 

145 150 155 

160 

Lys Ala Lys Glu Phe He Glu Lys Tyr Lys Cys Asp Leu Asp Gly 
Tyr 

165 170 175 



Tyr He Glu Gly Leu Phe Asn Leu Leu Leu Leu Ser Lys Pro Glu 
Glu 

180 185 190 



Leu Leu Glu Asn Asp Leu Val Val Ala Met Glu Gin Asp Lys Phe 
Val 

195 200 205 
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lie Arg Met Ser Arg Asp Ser His Ser Leu Phe Lys Arg His He 
Gin 

210 215 220 



Asp Arg Arg Gin Glu Val Val Ala Asp He Val Ser Lys Tyr Leu 
His 

225 230 . 235 

240 

Phe Asp Thr Tyr Glu Gly Met Ala Arg Asn Lys Leu Gin Cys Val 
Ala 

245 250 255 



Thr Ala Gly Ser His Leu Gly Glu Ala Lys Arg Gin Asp Asn Lys 
Met 

260 265 270 



Arg Val Tyr Tyr Gly Leu Leu Lys Glu Val Asp Phe Gin Thr Leu 
Thr 

275 280 285 



Thr Pro Ala Pro Ala Pro Glu Glu Glu Asp Asp Asp Pro Asp Ala 
Pro 

290 295 300 



Asp Arg Pro Lys Lys Lys Lys Pro Lys Lys Asp Pro Leu Leu Ser 
Lys 

305 310 315 

320 

Lys Ser Lys Ser Asp Pro Asn Ala Pro Ser He Asp Arg He Pro 
Leu 

325 330 335 

Pro Glu Leu Lys Asp Ser Asp Lys Leu Leu Lys Leu Lys Ala Leu 
Arg 

340 345 350 



Glu Ala Ser Lys Arg Leu Ala Leu Ser Lys Asp Gin Leu Pro Ser 
Ala 

355 360 365 



Val Phe Tyr Thr Val Leu Asn Ser His Gin Gly Val Thr Cys Ala 
Glu 



370 



375 



380 
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lie Ser Asp Asp Ser Thr Met Leu Ala Cys Gly Phe Gly Asp Ser 
Ser 

385 390 395 

400 

Val Arg lie Trp Ser Leu Thr Pro Ala Lys Leu Arg Thr Leu Lys 
Asp 

405 410 415 



Ala Asp Ser Leu Arg Glu Leu Asp Lys Glu Ser Ala Asp lie Asn 
Val 

420 425 430 



Arg Met Leu Asp Asp Arg Ser Gly Glu Val Thr Arg Ser Leu Met 
Gly 

435 440 445 



His Thr Gly Pro Val Tyr Arg Cys Ala Phe Ala Pro Glu Met Asn 
Leu 

450 455 460 



Leu Leu Ser Cys Ser Glu Asp Ser Thr lie Arg Leu Trp Ser Leu 
Leu 

465 470 475 

480 

Thr Trp Ser Cys Val Val Thr Tyr Arg Gly His Val Tyr Pro Val 
Trp 

485 490 495 



Asp Val Arg Phe Ala Pro His Gly Tyr Tyr Phe Val Ser Cys Ser 
Tyr 

500 505 510 



Asp Lys Thr Ala Arg Leu Trp Ala Thr Asp Ser Asn Gin Ala Leu 
Arg 

515 520 525 



Val Phe Val Gly His Leu Ser Asp Val Asp Cys Val Gin Phe His 
Pro 

530 535 540 



Asn Ser Asn Tyr Val Ala Thr Gly Ser Ser Asp Arg Thr Val Arg 
Leu 

545 550 555 

560 
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Trp Asp Asn Met Thr Gly Gin Ser Val Arg Leu Met Thr Gly His 
Lys 

565 570 575 



Gly Ser Val Ser Ser Leu Alg Phe Ser Ala Cys Gly Arg Tyr Leu 
Ala 

580 585 590 



Ser Gly Ser Val Asp His Asn lie lie lie Trp Asp Leu Ser Asn 
Gly 

595 600 605 



Ser Leu Val Thr Thr Leu Leu Arg His Thr Ser Thr Val Thr Thr 
He 

610 615 620 



Thr Phe Ser Arg Asp Gly Thr Val Leu Ala Ala Ala Gly Leu Asp 
Asn 

625 630 635 

640 

Asn Leu Thr Leu Trp Asp Phe His Lys Val Thr Glu Asp Tyr He 
Ser 

645 650 655 



Asn His He Thr Val Ser His His Gin Asp Glu Asn Asp Glu Asp 
Val 

660 665 670 



Tyr Leu Met Arg Thr Phe Pro Ser Lys Asn Ser Pro Phe Val Ser 
Leu 

675 680 685 



His Phe Thr Arg Arg Asn Leu Leu Met Cys Val Gly Leu Phe Lys 
Ser 

690 695 700 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2018 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

. (A) NAME/KEY: CDS 
(B) LOCATION: 70.. 1842 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 

GGAATTCGAG TTGGCCAAAG TGGCGCAATC CGGTATCAAT TGTTCAAACC 
GAGCAGCCCC 

TCCAGCAGC ATG CTG TAC GGC TCC AGC ATC TCG GCG GAG TCC ATG 
AAG 

108 

Met Leu Tyr Gly Ser Ser lie Ser Ala Glu Ser Met 

Lys 

1 5 10 



GTG ATC GCG GAG AGC ATC GGA GTG GGC TCC CTG TCG GAT GAC GCC 
GGC 

156 

Val lie Ala Glu Ser He Gly Val Gly Ser Leu Ser Asp Asp Ala 

Ala • ' 

15 20 25 



AAG GAA CTA GCG GAG GAT GTG TCC ATC AAG CTG AAG AGG ATT GTA 
CAG 

204 

Lys Glu Leu Ala Glu Asp Val Ser He Lys Leu Lys Arg He Val 
Gin 

30 35 40 

45 . 

GAT GCG GCC AAG TTC ATG AAC CAC GCC AAG CGG CAG AAG CTC TCA 
GTG 

252 

Asp Ala Ala Lys Phe Met Asn His Ala Lys Arg Gin Lys Leu Ser 
Val 

50 55 

60 

CGG GAC ATC GAC ATG TCC CTT AAG GTG CGA AAT GTG GAG CCG CAG 
TAC 

300 

Arg Asp He Asp Met Ser Leu Lys Val Arg Asn Val Glu Pro Gin 
Tyr 

65 70 75 
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GGT TTC GTA GCC AAG GAC TTC ATT CCA CTC CGC TTC GCA TCT GGC 
GGA 

348 

Gly Phe Val Ala Lys Asp Phe lie Pro Leu Arg Phe Ala Ser Gly 
Gly 

80 85 90 



GGA CGG GAG CTG CAC TTC ACC GAG GAC AAG GAA ATC GAC CTA GGA 
GAA 

396 

Gly Arg Glu Leu His Phe Thr Glu Asp Lys Glu lie Asp Leu Gly 
Glu 

95 100 105 



ATC ACA TCC ACC AAC TCT GTA AAA ATT CCC CTG GAT CTC ACC CTG 
CGC 

444 

lie Thr Ser Thr Asn Ser Val Lys lie Pro Leu Asp Leu Thr Leu 
Arg 

110 115 120 

125 

TCC CAT TGG TTT GTT GTG GAG GGA GTG CAA CCC ACT GTG CCC GAA 
AAC 

492 

Ser His Trp Phe Val Val Glu Gly Val Gin Pro Thr Val Pro Glu 
Asn 

130 135 140 



CCC CCT CCG CTC TCG AAG GAT TCC CAG TTA CTG GAC TCG GTC AAT 
CCA 

540 

Pro Pro Pro Leu Ser Lys Asp Ser Gin Leu Leu Asp Ser Val Asn 
Pro 

145 150 155 



GTT ATT AAG ATG GAT CAA GGC CTA AAC AAA GAT GCG GCA GGC AAA 
CCC 

588 

Val lie Lys Met Asp Gin Gly Leu Asn Lys Asp Ala Ala Gly Lys 
Pro 

160 165 170 



ACC ACC GGC AAG ATA CAC AAG CTG AAA AAC GTG GAG ACC ATT CAT 
GTC 

636 

Thr Thr Gly Lys lie His Lys Leu Lys Asn Val Glu Thr He His 
Val 
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175 180 185 



AAG CAA CTG GCC ACQ CAC GAG TTG TCC GTG GAG CAG CAG TTG TAG 
TAG 

684 

Lys Gin Leu Ala Thr His Glu Leu Ser Val Glu Gin Gin Leu Tyr 
Tyr 

190 195 200 

205 

AAG GAG ATC ACC GAG GCG TGC GTG GGA TCT GAT GAG CCG CGG CGC 
GGG 

732 

Lys Glu lie Thr Glu Ala Cys Val Gly Ser Asp Glu Pro Arg Arg 
Gly 

210 215 220 



GAA GCG CTG CAG TCG CTG GGA TCC GAT CCT GGC CTG CAC CAA ATG 
CTT 

780 

Glu Ala Leu Gin Ser Leu Gly Ser Asp Pro Gly Leu His Glu Met 
Leu 

225 230 235 



CCC CGC ATG TGC ACC TTC ATT GCC GAG GGA GTT AAG GTC AAT GTG 
GTT 

828 

Pro Arg Met Cys Thr Phe lie Ala Glu Gly Val Lys Val Asn Val 
Val 

240 245 250 



CAG AAC AAC TTG GCG TTG CTT ATT TAC CTC ATG CGC ATG GTT CGT 
GCG 

876 

Gin Asn Asn Leu Ala Leu Leu lie Tyr Leu Met Arg Met Val Arg 
Ala 

255 260 265 



CTT CTG GAT AAT CCT TCG CTG TTT CTG GAG AAA TAC CTC CAC GAA 
CTG 

924 

Leu Leu Asp Asn Pro Ser Leu Phe Leu Glu Lys Tyr Leu His Glu 
Leu 

270 275 280 

285 

ATA CCC TCG GTG ATG ACG TGC ATT GTG TCC AAA CAG CTG TGT ATG 
CGC 

972 
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He Pro Ser Val Met Thr Cys He Val Ser Lys Gin Leu Cys Met 
Arg 

290 295 300 



CCC GAG CTG GAC AAT CAC TGG GCC CTG CGA GAG TTT GCC TCC CGA 
CTG 
1020 

Pro Glu Leu Asp Asn His Trp Ala Leu Arg Asp Phe Ala Ser Arg 
Leu 

305 310 315 



ATG GCT CAA ATC TGC AAG AAC TTC AAT ACC CTA ACC AAC AAT CTG 
CAA 
1068 

Met Ala Gin He Cys Lys Asn Phe Asn Thr Leu Thr Asn Asn Leu 
Gin 

320 325 330 



ACC CGT GTC ACC CGC ATC TTC AGC AAG GCC CTG CAG AAC GAC AAG 
ACC 
1116 

Thr Arg Val Thr Arg He Phe Ser Lys Ala Leu Gin Asn Asp Lys 
Thr 

335 340 345 



CAC CTG TCC TCG CTT TAC GGC TCT ATT GCG GGT CTC TCG GAG CTG 
GGG 
1164 

His Leu Ser Ser Leu Tyr Gly Ser He Ala Gly Leu Ser Glu Leu 
Gly 

350 355 360 

365 

GGC GAA GTC ATA AAG GTT TTC ATC ATA CCC CGC CTT AAG TTC ATA 
TCG 
1212 

Gly Glu Val He Lys Val Phe He He Pro Arg Leu Lys Phe He 
Ser 

370 375 380 



GAG CGC ATT GAA CCT CAC CTG CTC GGC ACC TCC ATC AGC AAC ACT 
GAC 
1260 

Glu Arg lie Glu Pro His Leu Leu Gly Thr Ser He Ser Asn Thr 
Asp 

385 390 395 
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AAG ACA GCA GCA GGT CAC ATC CGC GCC ATG CTT CAG AAG TGC TGT 
CCC 
1308 

Lys Thr Ala Ala Gly His He Arg Ala Met Leu Gin Lys Cys Cys 
Pro 

400 405 410 



CCG ATT CTC AGG CAA ATG CTC AGO GCC AGA TAG AGC GGA GGA CTA 
CAA . 
1356 

Pro lie Leu Arg Gin Met Leu Ser Ala Arg Tyr Ser Gly Gly Leu 
Gin 

415 420 425 



GAA CGA CTT TGG CTT CCT GGG GCC GTC GOT GTG CCA GGC GTA GTC 
AAA 
1404 

Glu Arg Leu Trp Leu Pro Gly Ala Val Ala Val Pro Gly Val Val 
Lys 

430 435 440 

445 

GTT CGA AAT GCG CCC GCC TCA AGC ATT GTA ACC CTG TCA TCC AAC 
ACT 
1452 

Val Arg Asn Ala Pro Ala Ser Ser He Val Thr Leu Ser Ser Asn 
Thr 

450 455 460 



ATC AAC ACG GCA CCC ATC ACG AGT GCA GCA CAA ACA GCA ACA ACC 
ATC 
1500 

He Asn Thr Ala Pro He Thr Ser Ala Ala Gin Thr Ala Thr Thr 
He 

465 470 475 



GGA CGA GTG TCC ATG CCC ACC ACA CAG AGA CAG GGA AGT CCC GGA 
GTC 
1548 

Gly Arg Val Ser Met Pro Thr Thr Gin Arg Gin Gly Ser Pro Gly 
Val 

480 485 490 



TCG TCC CTG CCG CAA ATA AGA GCC ATT CAG GCC AAC CAG CCG GCG 
CAA 
1596 

Ser Ser Leu Pro Gin He Arg Ala He Gin Ala Asn Gin Pro Ala 
Gin 
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495 500 505 

AAG TTT GTG ATA GTC ACC CAG AAC TCG CCG CAG CAG GGC CAG GCG 
AAG 
1644 

Lys Phe Val He Val Thr Gin Asn Ser Pro Gin Gin Gly Gin Ala 
Lys 

510 515 520 

525 

GTG GTG CGG CGT GGC AGC TCT CCG CAC AGC GTG GTC CTC TCC GCG 
GCC 
1692 

Val Val Arg Arg Gly Ser Ser Pro His Ser Val Val Leu Ser Ala 
Ala 

530 535 540 



TCC AAC GCT GCC ACT GCC TCC AAT TCG AAC TCA AGC TCG AGC GGC 
AGT 
1740 

Ser Asn Ala Ala Ser Ala Ser Asn Ser Asn Ser Ser Ser Ser Gly 
Ser 

545 550 555 

CTA CTA GCG GCT GCA CAG CGG AGC AGC GAG AAT GTG TGT GTT ATT 
GCC 
1788 

Leu Leu Ala Ala Ala Gin Arg Ser Ser Glu Asn Val Cys Val lie 
Ala 

560 565 570 

GGT AGC GAA GCG CCA GCA GTT GAT GGT ATA ACA GTT CAA TCT TTC 
AGA 
1836 

Gly Ser Glu Ala Pro Ala Val Asp Gly lie Thr Val Gin Ser Phe 
Arg 

575 580 585 

GCA TCC TAGACGCCAA CTCGCTGATC ATTGAGACGG AGATTGTGCG 
CGCACCGGCC 

1892 
Ala Ser 

590 

CGAGCTGGCG GATCTCTCGC ACCTGGAGTA GCCAGCTTAG TTCGTAGTCC 
ACATTTTGTC 
1952 
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ATATTGTATG CAATAAAATA AAAAATGCGG GTTCCTACCC CAAAAAAATG 
TAAAAAAAAA 
2012 

AAAAAA 

2018 

(2) INFORMATION FOR SEQ ID N0:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 591 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Leu Tyr Gly Ser Ser lie Ser Ala Glu Ser Met Lys Val lie 
Ala 

1 5 10 

15 

Glu Ser lie Gly Val Gly Ser Leu Ser Asp Asp Ala Ala Lys Glu 
Leu 

20 25 30 



Ala Glu Asp Val Ser lie Lys Leu Lys Arg lie Val Gin Asp Ala 
Ala 

35 40 45 



Lys Phe Met Asn His Ala Lys Arg Gin Lys Leu Ser Val Arg Asp 
lie 

50 55 60 



Asp Met Ser Leu Lys Val Arg Asn Val Glu Pro Gin Tyr Gly Phe 
Val 

65 70 75 

80 

Ala Lys Asp Phe lie Pro Leu Arg Phe Ala Ser Gly Gly Gly Arg 
Glu 

85 90 > 

95 

Leu His Phe Thr Glu Asp Lys Glu He Asp Leu Gly Glu He Thr 
Ser 

100 105 110 
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Thr Asn Ser Val Lys lie Pro Leu Asp Leu Thr Leu Arg ser His 
Trp 

115 120 125 



Phe Val Val Glu Gly Val Gin Pro Thr Val Pro Glu Asn Pro Pro 
Pro 

130 135 140 



Leu Ser Lys Asp Ser Gin Leu Leu Asp Ser Val Asn Pro Val lie 
Lys 

145 150 155 

160 

Met Asp Gin Gly Leu Asn Lys Asp Ala Ala Gly Lys Pro Thr Thr 
Gly 

165 170 175 



Lys lie His Lys Leu Lys Asn Val Glu Thr lie His Val Lys Gin 
Leu 

180 185 190 



Ala Thr His Glu Leu Ser Val Glu Gin Gin Leu Tyr Tyr Lys Glu 
He 

195 200 205 



Thr Glu Ala Cys yal Gly Ser Asp Glu Pro Arg Arg Gly Glu Ala 
Leu 

210 215 220 



Gin Ser Leu Gly Ser Asp Pro Gly Leu His Glu Met Leu Pro Arg 
Met 

225 230 235 

240 

Cys Thr Phe He Ala Glu Gly Val Lys Val Asn Val Val Gin Asn 
Asn 

245 250 255 



Leu Ala Leu Leu lie Tyr Leu Met Arg Met Val Arg Ala Leu Leu 
Asp 

260 265 270 



Asn Pro Ser Leu Phe Leu Glu Lys Tyr Leu His Glu Leu He Pro 
Ser 

275 280 285 
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Val Met Thr Cys He Val Ser Lys Gin Leu Cys Met Arg Pro Glu 
Leu 

290 295 300 



Asp Asn His Trp Ala Leu Arg Asp Phe Ala Ser Arg Leu Met Ala 
Gin 

305 310 315 

320 

He Cys Lys Asn Phe Asn Thr Leu Thr Asn Asn Leu Gin Thr Arg 
Val 

325 330 335 



Thr Arg He Phe Ser Lys Ala Leu Gin Asn Asp Lys Thr His Leu 
Ser 

340 345 350 



Ser Leu Tyr Gly Ser He Ala Gly Leu Ser Glu Leu Gly Gly Glu 
Val 

355 360 365 



He Lys Val Phe He He Pro Arg Leu Lys Phe He Ser Glu Arg 
He 

370 375 380 



Glu Pro His Leu Leu Gly Thr Ser He Ser Asn Thr Asp Lys Thr 
Ala 

385 390 395 

400 

Ala Gly His He Arg Ala Met Leu Gin Lys Cys Cys Pro Pro He 
Leu 

405 410 415 



Arg Gin Met Leu Ser Ala Arg Tyr Ser Gly Gly Leu Gin Glu Arg 
Leu 

420 425 430 



Trp Leu Pro Gly Ala Val Ala Val Pro Gly Val Val Lys Val Arg 
Asn 

435 440 445 



Ala Pro Ala Ser Ser He Val Thr Leu Ser Ser Asn Thr He Asn 
Thr 

450 455 460 
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Ala Pro lie Thr Ser Ala Ala Gin Thr Ala Thr Thr He Gly Arg 
Val 

465 470 475 

480 

Ser Met Pro Thr Thr Gin Arg Gin Gly Ser Pro Gly Val Ser Ser 
Leu 

485 490 495 



Pro Gin He Arg Ala He Gin Ala Asn Gin Pro Ala Gin Lys Phe 
Val 

500 505 510 



He Val Thr Gin Asn Ser Pro Gin Gin Gly Gin Ala Lys Val Val 
Arg 

515 520 525 



Arg Gly Ser Ser Pro His Ser Val Val Leu Ser Ala Ala Ser Asn 
Ala 

530 535 540 



Ala Ser Ala Ser Asn Ser Asn Ser Ser Ser Ser Gly Ser Leu Leu 
Ala 

545 550 555 

560 

Ala Ala Gin Arg Ser Ser Glu Asn Val Cys Val He Ala Gly Ser 
Glu 

565 570 575 



Ala Pro Ala Val Asp Gly He Thr Val Gin Ser Phe Arg Ala Ser 
580 585 590 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 80.. 913 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GATATGTACG TGCACAATTT CAATGGAATA AACAATCTTC TTGCAGCAAA 
GCCGACGTAA 
60 

ACATAATAAC TATAGAAGT ATG AGC GCA GAG AAG TCC GAT AAG GCC 
AAG ATC 
112 

Met Scr Ala Glu Lys Ser Asp Lys Ala 

Lys lie 



10 



ACT GCC CAA ATC AAG CAC GTG CCG AAG GAC GCG CAG GTG ATC ATG 
TCC 

160 

Ser Ala Gin lie Lys His Val Pro Lys Asp Ala Gin Val lie Met 
Ser 

15 20 25 



ATC CTG AAG GAG CTG AAT GTC CAG GAG TAC GAG CCG CGC GTG GTC 
AAC 

208 

lie Leu Lys Glu Leu Asn Val Gin Glu Tyr Glu Pro Arg Val Val 
Asn 

30 35 40 



CAA CTG CTG GAG TTC ACC TTC CGC TAT GTC ACC TGC ATT CTG GAC 
GAC 

256 

Gin Leu Leu Glu Phe Thr Phe Arg Tyr Val Thr Cys lie Leu Asp 
Asp 

45 50 55 



GCC AAG GTA TAC GCC AAC. CAT GCG CGC AAG AAG ACC ATC GAC TTG 
GAC 

304 

Ala Lys Val Tyr Ala Asn His Ala Arg Lys Lys Thr lie Asp Leu 
Asp 

60 65 70 

75 

GAC GTG CGT CTG GCC ACC GAG GTT ACG CTG GAC AAG AGC TTC ACC 
GGG 

352 

Asp Val Arg Leu Ala Thr Glu Val Thr Leu Asp Lys Ser Phe Thr 
Gly 
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80 85 

90 

CCG TTG GAG CGC CAC GTT CTA GCC AAG GTG GCC GAC GTG CGC AAC 
AGC 

400 

Pro Leu Glu Arg His Val Leu Ala Lys Val Ala Asp Val Arg Asn 
Ser 

95 100 .105 



ATG CCC CTG CCA CCC ATT AAG CCG CAC TGC GGT CTC CGA CTG CCG 
CCC 

448 

Met Pro Leu Pro Pro lie Lys Pro His Cys Gly Leu Arg Leu Pro 
Pro 

110 115 120 



GAC CGC TAC TGT CTC ACC GGC GTC AAC TAC AAA CTG CGG GCC ACT 
AAT 

496 

Asp Arg Tyr Cys Leu Thr Gly Val Asn Tyr Lys Leu Arg Ala Thr 
Asn 

125 130 135 . 



CAG CCC AAG AAA ATG ACC AAG TCG GCG GTG GAG GGC CGT CCA CTG 
AAG 

544 

Gin Pro Lys Lys Met Thr Lys Ser Ala Val Glu Gly Arg Pro Leu 
Lys 

140 145 150 

155 

ACC GTC GTT AAG GCC GTC TCC AGC GCC AAT GGT CCG AAG AGG CCA 
CAC 

592 

Thr Val Val Lys Pro Val Ser Ser Ala Asn Gly Pro Lys Arg Pro 
His 

160 165 170 



TCC GTG GTG GCC AAG CAG CAG GTG GTG ACC ATT CCC AAG CCC GTC 
ATC 

640 

Ser Val Val Ala Lys Gin Gin- Val Val Thr lie Pro Lys Pro Val 
He 

175 180 185 



AAG TTT ACC ACC ACT ACG ACA ACG AAA ACG GTG GGC AGC TCC GGC 
GGA 

688 
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Lys Phe Thr Thr Thr Thr Thr Thr Lys Thr Val Gly Ser Ser Gly 
Gly 

190 195 200 



TCT GGG GGC GGC GGT GGT CAG GAG GTT AAG AGC GAG AGO ACC GGC 
GCC 

736 

Ser Gly Gly Gly Gly Gly Gin Glu Val Lys Ser Glu Ser Thr Gly 
Ala 

205 210 215 



GGC GGA GAT CTC AAG ATG GAG GTG GAC AGC GAT GCG GCG GCC GTG 
GGC 

784 

Gly Gly Asp Leu Lys Met Glu Val Asp Ser Asp Ala Ala Ala Val 
Gly 

220 225 230 

235 

AGC ATC GCT GGC GCA TCC GGT TCG GGA GCA GGA AGT GCC AGC GGA 
GGA 

832 

Ser lie Ala Gly Ala Ser Gly Ser Gly Ala Gly Ser Ala Ser Gly 
Gly 

240 245 250 



GGA GGA GGA GGA GGA TCA TCT GGC GTT GGA GTG GCC GTC AAG CGG 
GAA 

880 

Gly Gly Gly Gly Gly Ser Ser Gly Val Gly Val Ala Val Lys Arg 
Glu 

255 260 265 



CGT GAG GAG GAG GAG TTT GAG TTT GTG ACC AAC TAGCGAAACG 
ACATCATTTA 
933 

Arg Glu Glu Glu Glu Phe Glu Phe Val Thr Asn 



270 275 



CCTTAAATTA ATATTCTTAA ATCAGACCAA AGCACTTGCA TTTGGTTGAG 
CGAACTGGGG 
993 

GTCTAAATTT CAACTCGAAT GTGAAGTCCC AAAAACCTTA GTATAGATTC 
GCCCGTTAAT 
1053 
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CATTATGAAA TCTACGTTTT ATACACAAAT ACAACTACCA GATTTTCATA 
TTAAAAAAAA 
1113 

AAAAAAA • 

1120 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 278 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Ser Ala Glu Lys Ser Asp Lys Ala Lys lie Ser Ala Gin lie 
Lys 

15 10 

His Val Pro Lys Asp Ala Gin Val lie Met Ser lie Leu Lys Glu 
Leu 

20 25 30 



Asn Val Gin Glu Tyr Glu Pro Arg Val Val Asn Gin Leu Leu Glu 
Phe 

35 40 45 



Thr Phe Arg Tyr Val Thr Cys lie Leu Asp Asp Ala Lys Val Tyr 
Ala 

50 55 60 



Asn His Ala Arg Lys Lys Thr lie Asp Leu Asp Asp Val Arg Leu 
Ala 

65 70 75 

80 

Thr Glu Val Thr Leu Asp Lys Ser Phe Thr Gly Pro Leu Glu Arg 
His 

85 90 

95 

Val Leu Ala Lys Val Ala Asp Val Arg Asn Ser Met Pro Leu. Pro 
Pro 

100 105 110 
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lie Lys Pro His Cys Gly Leu Arg Leu Pro Pro Asp Arg Tyr Cys 

Thr Gly Val Asn Tyr Lys Leu Arg Ala Thr Asn Gin Pro Lys Lys 
"'^30 135 "0 



Thr Lys Ser Ala Val Glu Gly Arg Pro Leu Lys Thr Val Val Lys 

Pro , _c 

145 150 155 

160 

val ser Ser Ala Asn Gly Pro Lys Arg Pro His Ser Val Val Ala 

165 



175 



Gin Gin Val Val Thr lie Pro Lys Pro Val He Lys Phe Thr Thr 
'^^'^ 180 185 190 

Thr Thr Thr Lys Thr Val Gly Ser Ser Gly Gly Ser Gly Gly Gly 
195 200 205 

Gly Gin Glu Val Lys Ser Glu Ser Thr Gly Ala Gly Gly Asp Leu 
^^%10 215 220 

Met Glu Val ASP Ser Asp Ala Ala Ala Val Gly Ser He Ala Gly 
Ala 

225 230 235 

240 

ser Gly Ser Gly Ala Gly Ser Ala Ser Gly Gly Gly Gly Gly Gly 
Sly 245 250 255 

ser ser Gly Val Gly Val Ala Val Lys Arg Glu Arg Glu Glu Glu 
260 265 270 



Phe Glu Phe Val Thr Asn 
275 



(2) INFORMATION FOR SEQ ID NO: 10: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 62 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 14.. 5692 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TTATTTCCGG CAT ATG GGA CCC GGC TGC GAT TTG CTG CTG CGG ACA 
GCA 

49 

Met Gly Pro Gly Cys Asp Leu Leu Leu Arg Thr 

Ala 

1 5 10 



GCT ACC ATC ACT GCT GCC GCC ATC ATG TCA 6AC ACG GAC AGC GAC 
GAA 

97 / 

Ala Thr He Thr Ala Ala Ala He Met Ser Asp Thr Asp Ser Asp 
Glu 

15 20 25 



GAT TCC GCT GGA GGC GGC CCA TTT TCT TTA GCG GGT TTC CTT TTC 
GGC 

145 

Asp Ser Ala Gly Gly Gly Pro Phe Ser Leu Ala Gly Phe Leu Phe 
Gly 

30 35 40 



AAC ATC AAT GGA GCC GGG CAG CTG GAG GGG GAA AGC GTC TTG GAT 
GAT 

193 

Asn He Asn Gly Ala Gly Gin Leu Glu Gly Glu Ser Val Leu Asp 
Asp 

45 50 55 

60 

GAA TGT AAG AAG CAC TTG GCA GGC TTG GGG GCT TTG GGG CTG GGC 
AGC 

241 

Glu Cys Lys Lys His Leu Ala Gly Leu Gly Ala Leu Gly Leu Gly 
Ser 
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65 70 

75 

CTG ATC ACT GAA CTC ACG GCA AAT GAA GAA TTG ACC GGG ACT GAC 
GGT 

289 

Leu lie Thr Glu Leu Thr Ala Asn Glu Glu Leu Thr Gly Thr Asp 
Gly 

80 85 90 



6CC TTG GTA AAT GAT GAA GGG TGG GTT AGG AGT ACA GAA GAT GCT 
GTG 

337 

Ala Leu Val Asn Asp Glu Gly Trp Val Arg Ser Thr Glu Asp Ala 
Val 

95 100 105 



GAC TAT TCA GAC ATC AAT GAG GTG GCA GAA GAT GAA AGC CGA AGA 
TAC 

385 

Asp Tyr Ser Asp lie Asn Glu Val Ala Glu Asp Glu Ser Arg Arg 
Tyr 

110 115 120 



CAG CAG ACG ATG GGG AGC TTG CAG CCC CTT TGC CAC TCA GAT TAT 
GAT 

433 

Gin Gin Thr Met Gly Ser Leu Gin Pro Leu Cys His Ser Asp Tyr 
125 130 .^W-i^^cV "5 

GAA GAT GAC TAT GAT IgCT GAT TGT GAA GAC ATT GAT TGC AAG TTG 
ATG 1— 
481 

Glu Asp Asp Tyr Asp Ala Asp Cys Glu Asp lie Asp Cys Lys Leu 
Met 

145 150 155 



CCT OCT CCA CCT CCA CCC CCG GGA CCA ATG AAG AAG GAT AAG GAC 
CAG 

529 

Pro Pro Pro Pro Pro Pro Pro Gly Pro Met Lys Lys Asp Lys Asp 

GAT TCT ATT ACT GGT«TG TCT GAA AAT GGA GAA GGC ATC ATC TTG 

CCC C 

577 
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Asp Ser lie Thr Gly Val Ser Glu Asn Gly Glu Gly lie lie Leu 
Pro 

175 180 185 



TCC ATC ATT GCC CCT TCC TCT TTG GCC TCAlfrGAG AAA GTG GAC TTC 
AGT 

625 

Ser lie lie Ala Pro Ser Ser Leu Ala Ser Glu Lys Val Asp Phe 
Ser 

190 195 200 



AGT TCC TCT GAC TCA GAA TCT GAG ATG GGA CCT CAG GAA GCA ACA 
CAG 

673 

Ser Ser Ser Asp Ser Glu Ser Glu Met Gly Pro Gin Glu Ala Thr 
Gin 

205 210 215 

220 , 

GCA GAA TCT GAA GAT GGA AAG CTG ACC CTT CCA TTG GCT GGG ATT 
ATG 

721 

Ala Glu Ser Glu Asp Gly Lys Leu Thr Leu Pro Leu Ala Gly lie 
Met 

225 230 235 



CAG CAT GAT GCC ACC AAG CTG TTG CCA AGT GTC ACA GAA CTT TTT 
CCA 

769 

Gin His Asp Ala Thr Lys Leu Leu Pro Ser Val Thr Glu Leu Phe 
Pro 

240 245 250 



GAA TTT CGA CCT GGA AAG GTG TTA CGT TTT CTA CGT CTT TTT GGA 
CCA 

817 

Glu Phe Arg Pro Gly Lys Val Leu Arg Phe Leu Arg Leu Phe Gly 
Pro 

255 260 265 



GGG AAG AAT GTC CCA TCT GTT TGG CGG AGT GCT CGG AGA AAG AGG 
AAG 

865 

Gly Lys Asn Val Pro Ser Val Trp Arg Ser Ala Arg Arg Lys Arg 
Lys 

270 275 280 
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AAG AAG CAC CGT GAG CTG ATA CAG GAA GAG CAG ATC CAG GAG GTG 
GAG 

913 

Lys Lys His Arg Glu Leu He Gin Glu Glu Gin He Glii Glu Val 
Glu 

285 290 295 

300 

TGC TCA GTA GAA TCA GAA GTC AGC CAG AAG TCT TTG TGG AAC TAG 
GAC 

961 

Cys Ser Val Glu Ser Glu Val Ser Gin Lys Ser Leu Trp Asn Tyr 
Asp 

305 310 315 



TAC GCT CCA CCA CCA CCT CCA GAG CAG TGT CTC TCT GAT GAT GAA 
ATC 
1009 

Tyr Ala Pro Pro Pro Pro Pro Glu Gin Cys Leu Ser Asp Asp Glu 
He 

320 325 330 



ACG ATG ATG GCT CCT GTG GAG TCC AAA TTT TCC CAA TCA ACT GGA 
GAT 
1057 

Thr Met Met Ala Pro Val Glu Ser Lys Phe Ser Gin Ser Thr Gly 
Asp 

335 340 345 



ATA GAT AAA GTG ACA GAT ACC AAA CCA AGA GTG GCT GAG TGG CGT 
TAT 
1105 

He Asp Lys Val Thr Asp Thr Lys Pro Arg Val Ala Glu Trp Arg 
Tyr 

350 355 360 

GGG CCT GCC CGA CTG TGG TAT GAT ftTG CTG GGT GTC CCT GAA GAT 
GGC -J 
1153 

Gly Pro Ala Arg Leu Trp Tyr Asp Met Leu Gly Val Pro Glu Asp 
Gly 

365 370 375 

380 

AGT GGG TTT GAC TAT GGC TTC AAA CTG AGA AAG ACA GAA CAT GAA 
CCT 
1201 

Ser Gly Phe Asp Tyr Gly Phe Lys Leu Arg Lys Thr Glu His Glu 
Pro 



97 



BNSOOaO: <W0 ^M17087A1J^ 



wo 94/17087 PCT/US94/01114 



385 390 395 



GTG ATA AAA TCT AGA ATG ATA GAG GAA TTT AGG AAA CTT GAG GAA 
AAC 
1249 

Val lie Lys Ser Arg Met lie Glii Glu Phe Arg Lys Leu Glu Glu 
Asn 

400 405 410 



AAT GGC ACT GAT CTT CTG GCT GAT GAA AAC TTC CTG ATG GTG ACA 
GAG 
1297 

Asn Gly Thr Asp Leu Leu Ala Asp Glu Asn Phe Leu Met Val Thr 
Gin 

415 420 425 



CTG CAT TGG GAG GAT GAT ATC ATC TGG GAT GGG GAG GAT GTC AAA 
CAC 
1345 

Leu His Trp Glu Asp Asp lie lie Trp Asp Gly Glu Asp Val Lys 
His 

430 435 440 



AAA GGG ACA AAA CCT CAG CGT GCA AGC CTG GCA GGC TGG CTT CCT 
TCT 
1393 

Lys Gly Thr Lys Pro Gin Arg Ala Ser Leu Ala Gly Trp Leu Pro 
Ser 

445 450 455 

460 

AGC ATG ACT AGG AAT GCG ATG GCT TAC AAT GTT CAG CAA GGT TTT 
GCA 
1441 

Ser Met Thr Arg Asn Ala Met Ala Tyr Asn Val Glh Gin Gly Phe 
Ala 

465 470 475 



GCC ACT CTT GAT GAT GAC AAA CCT TGG TAC TCC ATT TTT CCC ATT 
GAC 
1489 

Ala Thr Leu Asp Asp Asp Lys Pro Trp Tyr Ser lie Phe Pro lie 
Asp 

480 485 490 



AAT GAG GAT CTG GTA TAT GGA CGC TGG GAG GAC AAT ATC ATT TGG 
GAT 
1537 
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Asn Glu Asp Leu Val Tyr Gly Arg Trp Glu Asp Asn lie lie Trp 
Asp 

495 500 505 



GCT CAG GCC ATG CCC CGG CTG TTG GAA CCT CCT GTT TTG ACA CTT 
GAT 
1585 

Ala Gin Ala Met Pro Arg Leu Leu Glu Pro Pro Val Leu Thr Leu 
Asp 

510 515 520 



CCC AAT GAT GAG AAC CTC ATT TTG GAA ATT CCT GAT GAG AAG GAA 
GAG 
1633 

Pro Asn Asp Glu Asn Leu lie Leu Glu lie Pro Asp Glu Lys Glu 
Glu 

525 530 535 

540 

GCC ACC TCT AAC TCC CCC TCC AAG GAG AGT AAG AAG GAA TCA TCT 
CTG 
1681 

Ala Thr Ser Asn Ser Pro Ser Lys Glu Ser Lys Lys Glu Ser Ser 
Leu 

545 550 555 



AAG AAG AGT CGA ATT CTC TTA GGG AAA ACA GGA GTC ATC AAG GAG 
GAA 
1729 

Lys Lys Ser Arg lie Leu Leu Gly Lys Thr Gly Val lie Lys Glu 
Glu 

560 565 570 



CCA CAG CAG AAC ATG TCT CAG CCA GAA GTG AAA GAT CCA TGG AAT 
CTC 
1777 

Pro Gin Gin Asn Met Ser Gin Pro Glu Val Lys Asp Pro Trp Asn 
Leu 

575 580 585 



TCC AAT GAT GAG TAT TAT TAT CCC AAG CAA CAG GGT CTT CGA GGC 
ACC 
1825 

Ser Asn Asp Glu Tyr Tyr Tyr Pro Lys Gin Gin Gly Leu Arg Gly 
Thr 

590 595 600 
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TTT GGA GGG AAT ATT ATC CAG CAT TCA ATT CCT GCT GTG GAA TTA 
CGG 
1873 

Phe Gly Gly Asn lie lie Gin His Ser lie Pro Ala Val Glu Leu 
Arg 

605 610 615 

620 

CAG CCC TTC TTT CCC ACC CAC ATG GGG CCC ATC AAA CTC CGG CAG 
TTC 
1921 

Gin Pro Phe Phe Pro Thr His Met Gly Pro lie Lys Leu Arg Gin 
Phe 

625 630 635 



CAT CGC CCA CCT CTG AAA AAG TAC TCA TTT GGT GCA CTT TCT CAG 
CCA 
1969 

His Arg Pro Pro Leu Lys Lys Tyr Ser Phe Gly Ala Leu Ser Gin 
Pro 

640 645 650 

GGT CCC CAC TCA GTC CAA CCT TTG CTA AAG CAC ATC AAA AAA AAG 
GCC 
2017 

Gly Pro His Ser Val Gin Pro Leu Leu Lys His lie Lys Lys Lys 
Ala 

655 660 665 

AAG ATG AGA GAA CAA GAG AGG CAA GCT TCA GGT GGT GGA GAG ATG 
TTT 
2065 

Lys Met Arg Glu Gin Glu Arg Gin Ala Ser Gly Gly Gly Glu Met 
Phe 

670 675 680 

TTT ATG CGC ACA CCT CAG GAC CTC ACA GGC AAA GAT GGT GAT CTT 
ATT 
2113 

Phe Met Arg Thr Pro Gin Asp Leu Thr Gly Lys Asp Gly Asp Leu 
lie 

685 690 695 

700 

CTT GCA GAA TAT AGT GAG GAA AAT GGA CCC TTA ATG ATG CAG GTT 
GGC 
2161 

Leu Ala Glu Tyr Ser Glu Glu Asn Gly Pro Leu Met Met Gin Val 
Gly 



too 



wo 94/17087 



PCT/US94/01114 



705 710 715 



ATG GCA ACC AAG ATA AAG AAC TAT TAT AAA CGG AAA CCT GGA AAA 
GAT 
2209 

Met Ala Thr Lys lie Lys Asn Tyr Tyr Lys Arg Lys Pro Gly Lys 
Asp 

720 725 730 



CCT GGA GCA CCA GAT TGT AAA TAT GGG GAA ACT GTT TAG TGC CAT 
ACA 
2257 

Pro Gly Ala Pro Asp Cys Lys Tyr Gly Glu Thr Val Tyr Cys His 
Thr 

735 740 745 



TCT CCT TTC CTG GGT TCT CTC CAT CCT GGC CAA TTG CTG CAA GCA 
TTT 
2305 

Ser Pro Phe Leu Gly Ser Leu His Pro Gly Gin Leu Leu Gin Ala 
Phe 

750 755 760 



GAG AAC AAC CTT TTT CGT GCT CCA ATT TAT CTT CAT AAG ATG CCA 
GAA 
2353 

Glu Asn Asn Leu Phe Arg Ala Pro lie Tyr Leu His Lys Met Pro 
Glu 

765 770 775 

780 

ACT GAT TTC TTG ATC ATT CGG ACA AGA CAG GGT TAC TAT ATT CGG 
GAA 
2401 

Thr Asp Phe Leu lie He Arg Thr Arg Gin Gly Tyr Tyr He Arg 
Glu 

785 790 795 



TTA GTG GAT ATT TTT GTG GTT GGC CAG CAG TGT CCC TTG TTT GAA 
GTT 
2449 

Leu Val Asp He Phe Val Val Gly Gin Gin Cys Pro Leu Phe Glu 
Val 

800 805 810 



CCT GGG CCT AAC TCC AAA AGG GCC AAT ACG CAT ATT CGA GAC TTT 
CTA 
2497 
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Pro Gly Pro Asn Ser Lys Arg Ala Asn Thr His lie Arg Asp Phe 
Leu 

sis 820 825 

CAG GTT TTT ATT TAC CGC CTT TTC TGG AAA AGT AAA GAT CGG CCA 
CGG 
2545 

Gin Val Phe lie Tyr Arg Leu Phe Trp Lys Ser Lys Asp Arg Pro 
Arg 

830 835 840 * 



AGG ATA CGA ATG GAA GAT ATA AAA AAA GCC TTT OCT TOO CAT TCA 
GAA 
2593 

Arg lie Arg Met Glu Asp lie Lys Lys Ala Phe Pro Ser His Ser 
Glu 

845 850 855 

860 

AGC AGC ATC CGG AAG AGG CTA AAG CTC TGC GCT GAC TTC AAA CGC 
ACA 
2641 

Ser Ser lie Arg Lys Arg Leu Lys Leu Cys Ala Asp Phe Lys Arg 
Thr 

865 870 875 



GGG ATG GAC TCA AAC TGG TGG GTG CTT AAG TCT GAT TTT CGT TTA 
CCA 
2689 

Gly Met Asp Ser Asn Trp Trp Val Leu Lys Ser Asp Phe Arg Leu 
Pro 

880 885 890 



ACG GAA GAA GAG ATC AGA GCT ATG GTG TCA CCA GAG CAG TGC TGT 
GCT 
2737 

Thr Glu Glu Glu lie Arg Ala Met Val Ser Pro Glu Gin Cys Cys 
Ala 

895 900 905 



TAT TAT AGC ATG ATA GCT GCA GAG CAA CGA CTG AAG GAT GCT GGC 
TAT 
2785 

Tyr Tyr Ser Met lie Ala Ala Glu Gin Arg Leu Lys Asp Ala Gly 
Tyr 

910 915 920 
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GGT GAG AAA TCC TTT TTT GCT CCA GAA GAA GAA AAT GAG GAA GAT 
TTC 
2833 

Gly Glu Lys Ser Phe Phe Ala Pro Glu Glu Glu Asn Glu Glu Asp 
Phe 

925 930 935 

940 

CAG ATG AAG ATT GAT GAT GAA GTT CGC ACT GCC CCT TGG AAC ACC 
ACA 
2881 

Gin Met Lys lie Asp Asp Glu Val Arg Thr Ala Pro Trp Asn Thr 
Thr 

945 950 955 

AGG GCC TTC ATT GCT GCC ATG AAG GGC AAG TGT CTG CTA GAG GTG 
ACT 
2929 

Arg Ala Phe lie Ala Ala Met Lys Gly Lys Cys Leu Leu Glu Val 
Thr 

960 965 970 



GGG GTG GCA GAT CCC ACG GGG TGT GGT GAA GGA TTC TCC TAT GTG 
AAG 
2977 

Gly Val Ala Asp Pro Thr Gly Cys Gly Glu Gly Phe Ser Tyr Val 
Lys 

975 980 985 



ATT CCA AAC AAA CCA ACA CAG CAG AAG GAT GAT AAA GAA CCG CAG 
CCA 
3025 

lie Pro Asn Lys Pro Thr Gin Gin Lys Asp Asp Lys Glu Pro Gin 
Pro 

990 995 1000 



GTG AAG AAG ACA GTG ACA GGA ACA GAT GCA GAC CTT CGT CGC CTT 
TCC 
3073 

Val Lys Lys Thr Val Thr Gly Thr Asp Ala Asp Leu Arg Arg Leu 
Ser 

1005 1010 1015 

1020 

CTG AAA AAT GCC AAG CAA CTT CTA CGT AAA TTT GGT GTG CCT GAG 
GAA 
3121 

Leu Lys Asn Ala Lys Gin Leu Leu Arg Lys Phe Gly Val Pro Glu 
Glu 
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1025 1030 1035 



GAG ATT AAA AAG TTG TCC. CGC TGG GAA GTG ATT GAT GTG GTG CGC 
ACA 
3169 

Glu lie Lys Lys Leu Ser Arg Trp Glu Val He Asp Val. Val Arg 
Thr 

1040 1045 1050 



ATG TCA ACA GAA CAG GCT CGT TCT GGA GAG GGG CCC ATG AGT AAA 
TTT 
3217 

Met Ser Thr Glu Gin Ala Arg Ser Gly Glu Gly Pro Met Ser Lys 
Phe 

1055 1060 1065 



GCC CGT GGA TCA AGG TTT TCT GTG GCT GAG CAT CAA GAG CGT TAC 
AAA 
3265 

Ala Arg Gly Ser Arg Phe Ser Val Ala Glu His Gin Glu Arg Tyr 
Lys 

1070 1075 1080 



GAG GAA TGT CAG CGC ATC TTT GAC CTA CAG AAC AAG GTT CTG TCA 
TCA 
3313 

Glu Glu Cys Gin Arg He Phe Asp Leu Gin Asn Lys Val Leu Ser 
Ser 

1085 1090 1095 

IIDO 

ACT GAA GTC TTA TCA ACT GAC ACA GAC AGC AGC TCA GCT GAA GAT 
AGT 
3361 

Thr Glu Val Leu Ser Thr Asp Thr Asp Ser Ser Ser Ala Glu Asp 
Ser 

1105 1110 1115 



GAC TTT GAA GAA ATG GGA AAG AAC ATT GAG AAC ATG TTG CAG AAC 
AAG 
3409 

Asp Phe Glu Glu Met Gly Lys Asn lie Glu Asn Met Leu Gin Asn 
Lys 

1120 1125 1130 



AAA ACC AGC TCT CAG CTT TCA CGT GAA CGG GAG GAA CAG GAG CGG 
AAG 
3457 
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Lys Thr Ser Ser Gin Leu Ser Arg Glu Arg Glu Glu Gin Glu Arg 
Lys 

1135 1140 1145 



GAA CTA CAG CGA ATG CTA CTG GCA GCA GGC TCA GCA GCA TCC GGA 
AAC 
3505 

Glu Leu Gin Arg Met Leu Leu Ala Ala Gly Ser Ala Ala Ser Gly 
Asn 

1150 1155 1160 



AAT CAC AGA GAT GAT GAC ACA GCT TCC GTG ACT AGC CTT AAC TCT 
TCT 
3553 

Asn His Arg Asp Asp Asp Thr Ala Ser Val Thr Ser Leu Asn Ser 
Ser 

1165 1170 1175 

1180 

GCC ACT GGA CGC TGT CTC AAG ATT TAT CGC ACG TTT CGA GAT GAA 
GAG 
3601 

Ala Thr Gly Arg Cys Leu Lys lie Tyr Arg Thr Phe Arg Asp Glu 
Glu 

1185 1190 1195 



GGG AAA GAG TAT GTT CGC TGT GAG ACA GTC CGA AAA CCA GCT GTC 
ATT 
3649 

Gly Lys Glu Tyr Val Arg Cys Glu Thr Val Arg Lys Pro Ala Val 
lie 

1200 1205 1210 



GAT GCC TAT GTG CGC ATA CGG ACT ACA AAA GAT GAG GAA TTC ATT 
CGA 
3697 

Asp Ala Tyr Val Arg lie Arg Thr Thr Lys Asp Glu Glu Phe lie 
Arg 

1215 1220 1225 



AAA TTT GCC CTT TTT GAT GAA CAA CAT CGG GAA GAG ATG CGA AAA 
GAA 
3745 

Lys Phe Ala Leu Phe Asp Glu Gin His Arg Glu Glu Met Arg Lys 
Glu 

1230 1235 1240 
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CGG CGG AGG ATT CAA GAG CAA CTG AGG CGG CTT AAG AGG AAC CAG 
GAA 
3793 

Arg Arg Arg lie Gin Glu Cln Leu Arg Arg Leu Lys Arg Asn Gin 
Glu 

1245 1250 1255 

1260 

AAG CTT AAG GGT CCT CCT GAG AAG AAG CCC AAG AAA ATG 



Lys Leu Lys Gly Pro Pro Glu Lys Lys Pro Lys Lys Met 
1265 1270 1275 



GAG CGT CCT GAC CTA AAA CTG AAA TGT GGG GCA TGT GGT GCC ATT 
GGA 
3889 

Glu Arg Pro Asp Leu Lys Leu Lys Cys Gly Ala Cys Gly Ala lie 
Gly 

1280 1285 1290 



CAC ATG AGG ACT AAC AAA TTC TGC CCC CTC TAT TAT CAA ACA AAT 
GCG 
3937 

His Met Arg Thr Asn Lys Phe Cys Pro Leu Tyr Tyr Gin Thr Asn 
Ala 

1295 1300 1305 



CCA CCT TCC AAC CCT GTT GCC ATG ACA GAA GAA CAG GAG GAG GAG 
TTG 
3985 

Pro Pro Ser Asn Pro Val Ala Met Thr Glu Glu Gin Glu Glu Glu 
Leu 

1310 1315 1320 



GAA AAG ACA GTC ATT CAT AAT GAT AAT GAA GAA CTT ATC AAG GTT 
GAA 
4033 

Glu Lys Thr Val lie His Asn Asp Asn Glu Glu Leu lie Lys Val 
Glu 

1325 1330 1335 

1340 

GGG ACC AAA ATT GTC TTG GGG AAA CAG CTA ATT GAG AGT GCG GAT 
GAG 
4081 

Gly Thr Lys lie Val Leu Gly Lys Gin Leu lie Glu Ser Ala Asp 
Glu 
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1345 1350 1355 



GTT CGC AG A AAA TCT CTG 'GTT CTC AAG TTT CCT AAA GAG GAG CTT 
OCT 
4129 

Val Arg Arg Lys Ser Leu Val Leu Lys Phe Pro Lys Gin Gin Leu 
Pro 

1360 1365 1370 



CCA AAG AAG AAA CGG CGA GTT GGA ACC ACT GTT CAC TGT GAC TAT 
TTG 
4177 

Pro Lys Lys Lys Arg Arg Val Gly Thr Thr Val His Cys Asp Tyr 
Leu 

1375 1380 1385 



AAT AGA CCT CAT AAG TCC ATC CAC CGG CGC CGC ACA GAC CCT ATG 
GTG 
4225 

Asn Arg Pro His Lys Ser lie His Arg Arg Arg Thr Asp Pro Met 
Val 

1390 1395 1400 



ACG CTG TCG TCC ATC TTG GAG TCT ATC ATC AAT GAC ATG AGA GAT 
CTT 
4273 

Thr Leu Ser Ser lie Leu Glu Ser lie lie Asn Asp Met Arg Asp 
Leu 

1405 1410 1415 

1420 

CCA AAT ACA TAC CCT TTC CAC ACT CCA GTC AAT GCA AAG GTT GTA 
AAG 
4321 

Pro Asn Thr Tyr Pro Phe His Thr Pro Val Asn Ala Lys Val Val 
Lys 

1425 1430 1435 



GAC TAC TAC AAA ATC ATC ACT CGG CCA ATG GAC CTA CAA ACA CTC 
CGC 
4369 

Asp Tyr Tyr Lys lie lie Thr Arg Pro Met Asp Leu Gin Thr Leu 
Arg 

1440 1445 1450 



GAA AAC GTG CGT AAA CGC CTC TAC CCA TCT CGG GAA GAG TTC AGA 
GAG 
4417 
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Glu Asn Val Arg Lys Arg Leu Tyr Pro Ser Arg Glu Giu Phe Arg 
Glu 

1455 1460 1465 



CAT CTG GAG CTA ATT GTG AAA AAT AGT GCA ACC TAG AAT GGG CCA 
AAA 
4465 

His Leu Glu Leu He Val Lys Asn Ser Ala Thr Tyr Asn Gly Pro 
Lys 

1470 1475 1480 



CAC TCA TTG ACT CAG ATC TCT CAA TCC ATG CTG GAT CTC TGT GAT 
GAA 
4513 

His Ser Leu Thr Gin He Ser Gin Ser Met Leu Asp Leu Cys Asp 
Glu 

1485 1490 1495 

1500 

AAA CTC AAA GAG AAA GAA GAC AAA TTA GCT CGC TTA GAG AAA GCT 
ATC 
4561 

Lys Leu Lys Glu Lys Glu Asp Lys Leu Ala Arg Leu Glu Lys Ala 
He 

1505 1510 1515 



AAC CCC TTG CTG GAT GAT GAT GAC CAA GTG GCG TTT TCT TTC ATT 
CTG 
4609 

Asn Pro Leu Leu Asp Asp Asp Asp Gin Val Ala Phe Ser Phe He 
Leu 

1520 1525 1530 

GAC AAC ATT GTC ACC CAG AAA ATG ATG GCA GTT CCA GAT TCT TGG 
CCA 
4657 

Asp Asn He Val Thr Gin Lys Met Met Ala Val Pro Asp Ser Trp 
Pro 

1535 1540 1545 



TTT CAT CAC CCA GTT AAT AAG AAA TTT GTT CCA GAT TAT TAC AAA 
GTG 
4705 

Phe His His Pro Val Asn Lys Lys Phe Val Pro Asp Tyr Tyr Lys 
Val 

1550 1555 1560 
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ATT GTC AAT CCA ATG GAT TTA GAG ACC ATA CGT AAG AAC ATC TCC 
AAG 
4753 

lie Val Asn Pro Met Asp Leu Glu Thr lie Arg Lys Asn lie Ser 
Lys 

1565 1570 1575 

1580 

CAC AAG TAT CAG AGT CGG GAG AGO TTT CTG GAT GAT GTA AAC CTT 
ATT 
4801 

His Lys Tyr Gin Ser Arg Glu Ser Phe Leu Asp Asp Val Asn Leu 
He 

1585 1590 1595 



CTG GCC AAC AGT GTT AAG TAT AAT GGA CCT GAG AGT CAG TAT ACT 
AAG 
4849 

Leu Ala Asn Ser Val Lys Tyr Asn Gly Pro Glu Ser Gin Tyr Thr 
Lys 

1600 1605 1610 



ACT GCC CAG GAG ATT GTG AAC GTC TGT TAC CAG ACA TTG. ACT GAG 
TAT 
4897 

Thr Ala Gin Glu He Val Asn Val Cys Tyr Gin Thr Leu Thr Glu 
Tyr 

1615 1620 1625 



GAT GAA CAT TTG ACT CAA CTT GAG AAG GAT ATT TGT ACT GCT AAA 
GAA 
4945 

Asp Glu His Leu Thr Gin Leu Glu Lys Asp He Cys Thr Ala Lys 
Glu 

1630 1635 1640 



GCA GCT TTG GAG GAA GCA GAA TTA GAA AGC CTG GAC CCA ATG ACC 
CCA 
4993 

Ala Ala Leu Glu Glu Ala Glu Leu Glu Ser Leu Asp Pro Met Thr 
Pro 

1645 1650 1655 

1660 

GGG CCC TAC ACG CCT CAG CCT CCT GAT TTG TAT GAT ACC AAC ACA 
TCC 
5041 

Gly Pro Tyr Thr Pro Gin Pro Pro Asp Leu Tyr Asp Thr Asn Thr 
Ser 
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1665 1670 1675 

CTC AGT ATG TCT CGA GAT .GGC TCT GTA TTT CAA GAT GAG AGC AAT 
ATG 
5089 

Leu Ser Met Ser Arg Asp Ala Ser Val Phe Gin Asp Glu Ser Asn 
Met 

1680 1685 1690 

TCT GTC TTG GAT ATC CCC AGT GCC ACT CCA GAA AAG CAG GTA ACA 
CAG 
5137 

Ser Val Leu Asp lie Pro Ser Ala Thr Pro Glu Lys Gin Val Thr 
Gin 

1695 1700 1705 



GAA GGT GAA GAT GGA GAT GGT GAT CTT GCA GAT GAA GAG GAA GGA 
ACT 
5185 

Glu Gly Glu Asp Gly Asp Gly Asp Leu Ala Asp Glu Glu Glu Gly 
Thr 

1710 1715 1720 

GTA CAA CAG CCT CAA GCC AGT GTC CTG TAT GAG GAT TTG CTT ATG 
TCT 
5233 

Val Gin Gin Pro Gin Ala Ser Val Leu Tyr Glu Asp Leu Leu Met 
Ser 

1725 1730 1735 

1740 

GAA GGA GAA GAT GAT GAG GAA GAT GCT GGG AGT GAT GAA GAA GGA 
GAC 
5281 

Glu Gly Glu Asp Asp Glu Glu Asp Ala Gly Ser Asp Glu Glu Gly 
Asp 

1745 1750 1755 



AAT CCT TTC TCT GCT ATC CAG CTG AGT GAA AGT GGA AGT GAC TCT 
GAT 
5329 

Asn Pro Phe Ser Ala lie Gin Leu Ser Glu Ser Gly Ser Asp Ser 
Asp 

1760 1765 1770 

GTG GGA TCT GGT GGA ATA AGA CCC AAA CAA CCC CGC ATG CTT CAG 
GAG 
5377 
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Val Gly Ser Gly Gly He Arg Pro Lys Gin Pro Arg Met Leu Gin 
Glu 

1775 1780 1785 



AAC ACA AGG ATG GAG ATG GAA AAT GAA GAA AGC ATG ATG TCC TAT 
GAG 
5425 

Asn Thr Arg Met Asp Met Glu Asn Glu Glu Ser Met Met Ser Tyr 
Glu 

1790 1795 1800 



GGA GAG GGT GGG GAG GCT TCC CAT GGT TTG GAG GAT AGC AAC ATC 
AGT 
5473 

Gly Asp Gly Gly Glu Ala Ser His Gly Leu Glu Asp Ser Asn lie 
Ser 

1805 1810 1815 

1820 

TAT GGG AGC TAT GAG GAG CCT GAT CCC AAG TCG AAC ACC CAA GAC 
ACA 
5521 

Tyr Gly Ser Tyr Glu Glu Pro Asp Pro Lys Ser Asn Thr Gin Asp 
Thr 

1825 1830 1835 



AGC TTC AGC AGC ATC GGT GGG TAT GAG GTA TCA GAG GAG GAA GAA 
GAT 
5569 

Ser Phe Ser Ser He Gly Gly Tyr Glu Val Ser Glu Glu Glu Glu 
Asp 

1840 1845 1850 



GAG GAG GAG GAA GAG CAG CGC TCT GGG CCG AGC GTA CTA AGC CAG 
GTC 
5617 

Glu Glu Glu Glu Glu Gin Arg Ser Gly Pro Ser Val Leu Ser Gin 
Val 

1855 I860 1865 



CAC CTG TCA GAG GAC GAG GAG GAC AGT GAG GAT TTC CAC TCC ATT 
GCT 
5665 

His Leu Ser Glu Asp Glu Glu Asp Ser Glu Asp Phe His Ser He 
Ala 

1870 1875 1880 
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GGG GAC ACT GAC TTG GAC TOT GAT GAA TGAGGCTTCC TTTGGGCCTC 
5712 

Gly Asp Ser Asp Leu Asp Ser Asp Glu 
1885 1890 

CTTGGTCAGC CTTCCCTGTT CTCCAGCCTA GGTGGTTCAC CTTTCCCCAA 
TTTGTTCATA 
5772 

TTTGTACAGT ATCTGATCCT GAAATCATGA AATTAACTAA CACCTTAGCC 
TTTTTAAAAG 
5832 

TAGTAAGTAA ATGATAATAA ATCACCTCTC CTAATCTTCC TGGGGCAATG 
TCACCCTTTG 
5892 

ATTTAAAACA AAGCAACCCC CTTTCCCCTA CCACTACGGA AAAGAGCAAG 
CTCATTTTTC 
5952 

CGTGTCCTCC 

5962 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1893 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Gly Pro Gly Cys Asp Leu Leu Leu Arg Thr Ala Ala Thr lie 
Thr 

1 5 10 

15 

Ala Ala Ala lie Met Ser Asp Thr Asp Ser Asp Glu Asp Ser Ala 
Gly 

20 25 30 



Gly Gly Pro Phe Ser Leu Ala Gly Phe Leu Phe Gly Asn lie Asn 
Gly 

35 40 45 
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Ala Gly Gin Leu Glu Gly Glu Ser Val Leu Asp Asp Glu Cys Lys 
Lys 

50 55 60 



His Leu Ala Gly Leu Gly Ala Leu Gly Leu Gly Ser Leu lie Thr 
Glu 

65 70 75 

80 

Leu Thr Ala Asn Glu Glu Leu Thr Gly Thr Asp Gly Ala Leu Val 
Asn 

85 .90 

95 

Asp Glu Gly Trp Val Arg Ser Thr Glu Asp Ala Val Asp Tyr Ser 
Asp 

100 105 110 



lie Asn Glu Val Ala Glu Asp Glu Ser Arg Arg Tyr Gin Gin Thr 
Met 

115 120 125 



Gly Ser Leu Gin Pro Leu Cys His Ser Asp Tyr Asp Glu Asp Asp 
Tyr 

130 135 140 



Asp Ala Asp Cys Glu Asp lie Asp Cys Lys Leu Met Pro Pro Pro 
Pro 

145 150 155 

160 

Pro Pro Pro Gly Pro Met Lys Lys Asp Lys Asp Gin Asp Ser lie 
Thr 

165 170 175 



Gly Val Ser Glu Asn Gly Glu Gly lie lie Leu Pro Ser lie lie 
Ala 

180 185 190 



Pro Ser Ser Leu Ala Ser Glu Lys Val Asp Phe Ser Ser Ser Ser 
Asp 

195 200 205 



Ser Glu Ser Glu Met Gly Pro Gin Glu Ala Thr Gin Ala Glu Ser 
Glu 

210 215 220 
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Asp Gly Lys Leu Thr Leu Pro Leu Ala Gly lie Met Gin His Asp 
Ala 

225 230 . 235 

240 

Thr Lys Leu Leu Pro Ser Val Thr Glu Leu Phe Pro Glu Phe Arg 
Pro 

245 250 255 



Gly Lys Val Leu Arg Phe Leu Arg Leu Phe Gly Pro Gly Lys Asn 
Val 

260 265 270 



Pro Ser Val Trp Arg Ser Ala Arg Arg Lys Arg Lys Lys Lys His 
Arg 

275 280 285 



Glu Leu lie Gin Glu Glu Gin lie Gin Glu Val Glu Cys Ser Val 
Glu 

290 295 300 



Ser Glu Val Ser Gin Lys Ser Leu Trp Asn Tyr Asp Tyr Ala Pro 
Pro 

305 310 315 

320 

Pro Pro Pro Glu Gin Cys Leu Ser Asp Asp Glu lie Thr Met Met 
Ala 

325 330 335 



Pro Val Glu Ser Lys Phe Ser Gin Ser Thr Gly Asp lie Asp Lys 
Val 

340 345 350 



Thr Asp Thr Lys Pro Arg Val Ala Glu Trp Arg Tyr Gly Pro Ala 
Arg 

355 360 365 



Leu Trp Tyr Asp Met Leu Gly Val Pro Glu Asp Gly Ser Gly Phe 
Asp 

370 375 380 



Tyr Gly Phe Lys Leu Arg Lys Thr Glu His Glu Pro Val lie Lys 
Ser 

385 . 390 395 

400 
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Arg Met lie Glu Glu Phe Arg Lys Leu Glu Glu Asn Asn Gly Thr 
Asp 

405 410 415 



Leu Leu Ala Asp Glu Asn Phe Leu Met Val Thr Gin Leu His Trp 
Glu 

420 425 430 



Asp Asp lie He Trp Asp Gly Glu Asp Val Lys His Lys Gly Thr 
Lys 

435 440 445 



Pro Gin Arg Ala Ser Leu Ala Gly Trp Leu Pro Ser Ser Met Thr 
Arg 

450 455 460 



Asn Ala Met Ala Tyr Asn Val Gin Gin Gly Phe Ala Ala Thr Leu 
Asp 

465 470 475 

480 

Asp Asp Lys Pro Trp Tyr Ser He Phe Pro He Asp Asn Glu Asp 
Leu 

485 490 495 



Val Tyr Gly Arg Trp Glu Asp Asn He He Trp Asp Ala Gin Ala 
Met 

500 505 510 



Pro Arg Leu Leu Glu Pro Pro Val Leu Thr Leu Asp Pro Asn Asp 
Glu 

515 520 525 



Asn Leu He Leu Glu He Pro Asp Glu Lys Glu Glu Ala Thr Ser 
Asn 

530 535 540 



Ser Pro Ser Lys Glu Ser Lys Lys Glu Ser Ser Leu Lys Lys Ser 
Arg 

545 550 555 

560 

He Leu Leu Gly Lys Thr Gly Val He Lys Glu Glu Pro Gin Gin 
Asn 

565 570 575 
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Met Ser Gin Pro Glu Val Lys Asp Pro Trp Asn Leu Ser Asn Asp 
Glu 

580 585 590 



Tyr Tyr Tyr Pro Lys Gin Gin Gly Leu Arg Gly Thr Phe Gly Gly 
Asn 

595 600 605 



lie lie Gin His Ser lie Pro Ala Val Glu Leu Arg Gin Pro Phe 
Phe 

610 615 620 



Pro Thr His Met Gly Pro lie Lys Leu Arg Gin Phe His Arg Pro 
Pro 

625 630 635 

640 

Leu Lys Lys Tyr Ser Phe Gly Ala Leu Ser Gin Pro Gly Pro His 
Ser 

645 650 655 



Val Gin Pro Leu Leu Lys His lie Lys Lys Lys Ala Lys Met Arg 
Glu 

660 665 670 



Gin Glu Arg Gin Ala Ser Gly Gly Gly Glu Met Phe Phe Met Arg 
Thr 

675 680 685 



Pro Gin Asp Leu Thr Gly Lys Asp Gly Asp Leu lie Leu Ala Glu 
Tyr 

690 695 700 



Ser Glu Glu Asn Gly Pro Leu Met Met Gin Val Gly Met Ala Thr 
Lys 

705 710 715 

720 

lie Lys Asn Tyr Tyr Lys Arg Lys Pro Gly Lys Asp Pro Gly Ala 
Pro 

725 730 735 



Asp Cys Lys Tyr Gly Glu Thr Val Tyr Cys His Thr Ser Pro Phe 
Leu 

740 745 750 
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Gly Ser Leu His Pro Gly Gin Leu Leu Gin Ala Phe Glu Asn Asn 
Leu 

755 760 765 



Phe Arg Ala Pro lie Tyr Leu His Lys Met Pro Glu Thr Asp Phe 
Leu 

• 770 775 780 



lie lie Arg Thr Arg Gin Gly Tyr Tyr lie Arg Glu Leu Val Asp 
He 

785 790 795 

800 

Phe Val Val Gly Gin Gin Cys Pro Leu Phe Glu Val Pro Gly Pro 
Asn 

805 810 815 



Ser Lys Arg Ala Asn Thr His He Arg Asp Phe Leu Gin Val Phe 
He 

820 825 830 



Tyr Arg Leu Phe Trp Lys Ser Lys Asp Arg Pro Arg Arg He Arg 
Met 

835 840 845 



Glu Asp He Lys Lys Ala Phe Pro Ser His Ser Glu Ser Ser He 
Arg 

850 855 860 



Lys Arg Leu Lys Leu Cys Ala Asp Phe Lys Arg Thr Gly Met Asp 
Ser 

865 870 875 

880 

Asn Trp Trp Val Leu Lys Ser Asp Phe Arg Leu Pro Thr Glu Glu 
Glu 

885 890 895 



He Arg Ala Met Val Ser Pro Glu Gin Cys Cys Ala Tyr Tyr Ser 
Met 

900 905 910 



He Ala Ala Glu Gin Arg Leu Lys Asp Ala Gly Tyr Gly Glu Lys 
Ser 

915 920 925 
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Phe Phe Ala Pro Glu Glu Glu Asn Glu Glu Asp Phe Gin Met Lys 
He 

930 935 940 



Asp Asp Glu Val Arg Thr Ala Pro Trp Asn Thr Thr Arg Ala Phe 
He 

945 950 955 

960 

Ala Ala Met Lys Gly Lys Cys Leu Leu Glu Val Thr Gly Val Ala 
Asp 

965 970 975 



Pro Thr Gly Cys Gly Glu Gly Phe Ser Tyr Val Lys He Pro Asn 
Lys 

980 985 990 



Pro Thr Gin Gin Lys Asp Asp Lys Glu Pro Gin Pro Val Lys Lys 
Thr 

995 1000 1005 



Val Thr Gly Thr Asp Ala Asp Leu Arg Arg Leu Ser Leu Lys Asn 
Ala 

1010 1015 1020 



Lys Gin Leu Leu Arg Lys Phe Gly Val Pro Glu Glu Glu He Lys 
Lys 

1025 1030 1035^ 

1040 

Leu Ser Arg Trp Glu Val He Asp Val Val Arg Thr Met Ser Thr 
Glu 

1045 1050 1055 



Gin Ala Arg Ser Gly Glu Gly Pro Met Ser Lys Phe Ala Arg Gly 
Ser 

1060 1065 1070 



Arg Phe Ser Val Ala Glu His Gin Glu Arg Tyr Lys Glu Glu Cys 
Gin 

1075 1080 1085 



Arg He Phe Asp Leu Gin Asn Lys Val Leu Ser Ser Thr Glu Val 
Leu 

1090 1095 1100 
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Ser Thr Asp Thr Asp Ser Ser Ser Ala Glu Asp Ser Asp Phe Glu 
Glu 

1105 1110 1115 

1120 

Met Gly Lys Asn lie Glu Asn Met Leu Gin Asn Lys Lys Thr Ser 
Ser 

1125 1130 1135 



Gin Leu Ser Arg Glu Arg Glu Glu Gin Glu Arg Lys Glu Leu Gin 
Arg 

1140 1145 1150 



Met Leu Leu Ala Ala Gly Ser Ala Ala Ser Gly Asn Asn His Arg 
Asp 

1155 1160 1165 



Asp Asp Thr Ala Ser Val Thr Ser Leu Asn Ser Ser Ala Thr Gly 
Arg 

1170 1175 1180 



Cys Leu Lys lie Tyr Arg Thr Phe Arg Asp Glu Glu Gly Lys Glu 
Tyr 

1185 1190 1195 

1200 

Val Arg Cys Glu Thr Val Arg Lys Pro Ala Val lie Asp Ala Tyr 
Val 

1205 1210 : 1215 



Arg lie Arg Thr Thr Lys Asp Glu Glu Phe lie Arg Lys Phe Ala 
Leu 

1220 1225 1230 



Phe Asp Glu Gin His Arg Glu Glu Met Arg Lys Glu Arg Arg Arg 
He 

1235 1240 1245 



Gin Glu Gin Leu Arg Arg Leu Lys Arg Asn Gin Glu Lys Glu Lys 
Leu 

1250 1255 1260 



Lys Gly Pro Pro Glu Lys Lys Pro Lys Lys Met Lys Glu Arg Pro 
Asp 

1265 1270 1275 

1280 
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Leu Lys Leu Lys Cys Gly Ala Cys Gly Ala lie Gly His Met Arg 
Thr 

1285 1290 1295 



Asn Lys Phe Cys Pro Leu Tyr Tyr Gin Thr Asn Ala Pro Pro Ser 
Asn 

1300 1305 1310 



Pro Val Ala Met Thr Glu Glu Gin Glu Glu Glu Leu Glu Lys Thr 
Val 

1315 1320 1325 



lie His Asn Asp Asn Glu Glu Leu lie Lys Val Glu Gly Thr Lys 
lie 

1330 1335 1340 



Val Leu Gly Lys Gin Leu lie Glu Ser Ala Asp Glu Val Arg Arg 
Lys 

1345 1350 1355 

1360 

Ser Leu Val Leu Lys Phe Pro Lys Gin Gin Leu Pro Pro Lys Lys 
Lys 

1365 1370 1375 



Arg Arg Val Gly Thr Thr Val His Cys Asp Tyr Leu Asn Arg Pro 
His 

1380 1385 1390 



Lys Ser lie His Arg Arg Arg Thr Asp Pro Met Val Thr Leu Ser 
Ser 

1395 1400 1405 



lie Leu Glu Ser lie lie Asn Asp Met Arg Asp Leu Pro Asn Thr 
Tyr 

1410 1415 1420 



Pro Phe His Thr Pro Val Asn Ala Lys Val Val Lys Asp Tyr Tyr 
Lys 

1425 1430 1435 

1440 

lie lie Thr Arg Pro Met Asp Leu Gin Thr Leu Arg Glu Asn Val 
Arg 

1445 1450 1455 
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Lys Arg Leu Tyr Pro Ser Arg Glu Glu Phe Arg Glu His Leu Glu 
Leu 

1460 1465 1470 



lie Val Lys Asn Ser Ala Thr Tyr Asn Gly Pro Lys His Ser Leu 
Thr 

1475 1480 1485 



Gin lie Ser Gin Ser Met Leu Asp Leu Cys Asp Glu Lys Leu Lys 
Glu 

1490 1495 1500 



Lys Glu Asp Lys Leu Ala Arg Leu Glu Lys Ala lie Asn Pro Leu 
Leu 

1505 1510 1515 

1520 

Asp Asp Asp Asp Gin Val Ala Phe Ser Phe lie Leu Asp Asn lie 
Val 

1525 1530 1535 



Thr Gin Lys Met Met Ala Val Pro Asp Ser Trp Pro Phe His His 
Pro 

1540 1545 1550 



Val Asn Lys Lys Phe Val Pro Asp Tyr Tyr Lys Val lie Val Asn 
Pro 

1555 1560 1565 



Met Asp Leu Glu thr lie Arg Lys Asn lie Ser Lys His Lys Tyr 
Gin 

1570 1575 1580 



Ser Arg Glu Ser Phe Leu Asp Asp Val Asn Leu lie Leu Ala Asn 
Ser 

1585 1590 1595 

1600 

Val Lys Tyr Asn Gly Pro Glu Ser Gin Tyr Thr Lys Thr Ala Gin 
Glu 

1605 1610 1615 



lie Val Asn Val Cys Tyr Gin Thr Leu Thr Glu Tyr Asp Glu His 
Leu 

1620 1625 1630 
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Thr Gin Leu Glu Lys Asp lie Cys Thr Ala Lys Glu Ala Ala Leu 
Glu 

1635 1640 1645 



Glu Ala Glu Leu Glu Ser Leu Asp Pro Met Thr Pro Gly Pro Tyr 
Thr 

1650 1655 1660 



Pro Gin Pro Pro Asp Leu Tyr Asp Thr Asn Thr Ser Leu Ser Met 
Ser 

1665 1670 1675 

1680 

Arg Asp Ala Ser Val Phe Gin Asp Glu Ser Asn Met Ser Val Leu 
Asp 

1685 1690 1695 



lie Pro Ser Ala Thr Pro Glu Lys Gin Val Thr Gin Glu Gly Glu 
Asp 

1700 1705 1710 



Gly Asp Gly Asp Leu Ala Asp Glu Glu Glu Gly Thr Val Gin Gin 
Pro 

1715 1720 1725 



Gin Ala Ser Val Leu Tyr Glu Asp Leu Leu Met Ser Glu Gly Glu 
Asp 

1730 1735 1740 



Asp Glu Glu Asp Ala Gly Ser Asp Glu Glu Gly Asp Asn Pro Phe 
Ser 

1745 1750 1755 

1760 

Ala lie Gin Leu Ser Glu Ser Gly Ser Asp Ser Asp Val Gly Ser 
Gly 

1765 1770 1775 



Gly lie Arg Pro Lys Gin Pro Arg Met Leu Gin Glu Asn Thr Arg 
Met 

1780 1785 1790 



Asp Met Glu Asn Glu Glu Ser Met Met Ser Tyr Glu Gly Asp Gly 
Gly 

1795 1800 1805 
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Glu Ala Ser His Gly Leu Glu Asp Ser Asn He Ser Tyr Gly Ser 
Tyr 

1810 1815 1820 

Glu Glu Pro^ Asp Pro Lys Ser Asn Thr Gin Asp Thr Ser Phe Ser 
Ser 

1825 1830 1835 

1840 

He Gly Gly Tyr Glu Val Ser Glu Glu Glu Glu Asp Glu Glu Glu 
Glu 

1845 1850 1855 

Glu Gin Arg Ser Gly Pro Ser Val Leu Ser Gin Val His Leu Ser 
Glu 

1860 1865 1870 

Asp Glu Glu Asp Ser Glu Asp Phe His Ser He Ala Gly Asp Ser 
Asp 

1875 1880 1885 



Leu Asp Ser Asp Glu 
1890 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3182 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 972.. 3002 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CGAGTTTTTT tTTTTTTTTT TTTTACAAGA GCACAAATCC ACATTTATTT 
ATTGATTTTT 
60 

CGTTAGTTTA AATCCTTGAG GGGTACAGCA TCACTCGGAT TCTGTGTCCA 
ATGGCCTTAG 
120 
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CAGGAAGATT GCTTCGGAAT TTGGCACGAA CCATGCCACT GTTTCCATGG 
GCCCGAGTTA 
180 

CTTTTCCCCA GATGACTCTG GTTTTGTTTG GTTTGCCGCC AGGAGTGACT 
GTGTTGTTCT 
240 

TTGCTTTATA TACATAAGCG CATCTCTTGC CCAAATAGAA TTCTGTTTCA 
TCTCGGGCGT 
300 

AAACACCTTC AATTTTAAGA AGAGCTGTGT GCTCCCTTTG GTTCCGGAGA 
CCCCGCTTAT 
360 

AGCCAGCAAA AATGGCCTTG GACCACAGCC TTCCAGACAT AGTTCCTTTT 
AGAAGTCCCG 
420 

TTCCCAGCAG GCCTCCACAG GAGCCAAGAT GGCGCCGAGC CGGGTGAGCA 
GCGTCTCGGC 
480 

TGCCGCTAGA GTTTTCCTGC TCCCCGCGCT CGGGTGGCGG GGGCGGGTCT 
GAGTGGTACC 
540 

CCGGAGGAGA CCCTTTGAAG GTCCCTTGTG GGGACTGGAA AGAGGACGGT 
TGGTTGTGTG 
600 

TCTGTGCTCG TGGGGACCCC GTGTGTGTGC CTGCATTGGA GAGATGTTGC 
AGGAGATGGG 
660 

GTGGGCTCTC TGAACCTCCT TTCGCGCTGC CCGGGGATCT TCGACCTGCT 
TCTCTGCTGG 
720 

GATCTCGCTT AAGTTAACCC TT.CCCTGGGA CGCCTTCCTG CCGCCTCCAC 
TGATCTGAGG 
780 

AGATCCTGTG ACTGTAGCGT GTTTTATGAG CCTTTACTGG CAGAGGGTAC 
CGCCGGGTAT 
840 

TGAAGGATTC GTAGGAGTTC GCCAGGGAAG TGGGACACGA CCCCCTCTTG 
TAAACCCGGC , 
900 

GCCAGGCACA GAGGTCTCGG TCTCTCCACC GGGGGCTTCA TCCTTCCAGG 
GAGGAGAAGA 
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960 

GGGACTCCAG A ATG GCT GAG GAG AAG AAG CTG AAG CTT AGC AAC 
ACT GTG 
1010 

Met Ala Glu Glu Lys Lys Leu Lys Leu Ser Asn 

Thr Val 

15 10 



CTG CCC TCG GAG TCC ATG AAG GTG GTG GCT GAA TCC ATG GGC ATC 
GCC 
1058 

Leu Pro Ser Glu Ser Met Lys Val Val Ala Glu Ser Met Gly lie 
Ala 

15 20 25 



CAG ATT CAG GAG GAG ACC TGC CAG CTG CTA ACG GAT GAG GTC AGC 
TAG 
1106 

Gin lie Gin Glu Glu Thr Cys Gin Leu Leu Thr Asp Glu Val Ser 
Tyr 

30 35 40 

45 

CGC ATC AAA GAG ATC GCA CAG GAT GCC TTG AAG TTC ATG CAC ATG 
GGG 
1154 

Arg lie Lys Glu lie Ala Gin Asp Ala Leu Lys Phe Met His Met 
Gly 

50 55 

60 

AAG CGG CAG AAG CTC ACC ACC AGT GAC ATT GAC TAC GCC TTG AAG 
CTA 
1202 

Lys Arg Gin Lys Leu Thr Thr Ser Asp lie Asp Tyr Ala Leu Lys 
Leu 

65 70 75 



AAG AAT GTC GAG CCA CTC TAT GGC TTC CAC GCC CAG GAC TTC ATT 
CCT 
1250 

Lys Asn Val Glu Pro Leu Tyr Gly Phe His Ala Gin Asp Phe lie 
Pro 

80 85 90 



TTC CGC TTC GCC TCT GGT GGG GGC CGG GAG CTT TAC TTC TAT GAG 
GAG 
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1298 

Phe Arg Phe Ala Ser Gly Gly Gly Arg Glu Leu Tyr Phe Tyr Glu 
Glu 

95 . 100 105 



AAG GAG GTT GAT CTG AGC GAC ATC ATC AAT ACC CCT CTG CCC CGG 
GTG 
1346 

Lys Glu Val Asp Leu Ser Asp He He Asn Thr Pro Leu Pro Arg. 
Val 

110 115 120 

125 

CCC CTG GAC GTC TGC CTC AAA GCT CAT TGG CTG AGC ATC GAG GGC 
TGC 
1394 

Pro Leu Asp Val Cys Leu Lys Ala His Trp Leu Ser He Glu Gly 
Cys 

130 135 140 

CAG CCA GCT ATC CCC GAG AAC CCG CCC CCA GCT CCC AAA GAG CAA 
CAG 
1442 

Gin Pro Ala He Pro Glu Asn Pro Pro Pro Ala Pro Lys Glu Gin 
Gin 

145 150 155 



AAG GCT GAA GCC ACA GAA CCC CTG AAG TCA GCC AAG CCA GGC CAG 
GAG 
1490 

Lys Ala Glu Ala Thr Glu Pro Leu Lys Ser Ala Lys Pro Gly Gin 
Glu 

160 165 170 



GAA GAC GGA CCC CTG AAG GGC AAA GGT CAA GGG GCC ACC ACA GCC 
GAC 
1538 

Glu Asp Gly Pro Leu Lys Gly Lys Gly Gin Gly Ala Thr Thr Ala 
Asp 

175 180 185 



GGC AAA GGG AAA GAG AAG AAG GCG CCG CCC TTG CTG GAG GGG GCC 
CCC 
1586 

Gly Lys Gly Lys Glu Lys Lys Ala Pro Pro Leu Leu Glu Gly Ala 
Pro 

190 195 200 

205 
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TTG CGA CTG AAG CCC CGG AGC ATC CAC GAG TTG TCT GTG GAG CAG 
CAG 
1634- 

Leu Arg Leu Lys Pro Arg Ser lie His Glu Leu Ser Val Glu Gin 
Gin 

210 215 220 



CTC TAC TAC AAG GAG ATC ACC GAG GCC TGC GTG GGC TCC TGC GAG 
GCC 
1682 

Leu Tyr Tyr Lys Glu lie Thr Glu Ala Cys Val Gly Ser Cys Glu 
Ala 

225 230 235 



AAG AGG GCG GAA GCC CTG CAA AGC ATT GCC ACG GAC CCT GGA CTG 
TAT 
1730 

Lys Arg Ala Glu Ala Leu Gin Ser lie Ala Thr Asp Pro Gly Leu 
Tyr 

240 245 250 



CAG ATG CTG CCA CGG TTC AGT ACC TTT ATC TCG GAG GGG GTC CGT 
GTG 
1778 

Gin Met Leu Pro Arg Phe Ser Thr Phe lie Ser Glu Gly Val Arg 
Val 

255 260 265 



AAC GTG GTT CAG AAC AAC CTG GCC CTA CTC ATC TAC CTG ATG CGT 
ATG 
1826 

Asn Val Val Glh Asn Asn Leu Ala Leu Leu lie Tyr Leu Met Arg 
Met 

270 275 280 

285 

GTG AAA GCG CTG ATG GAC AAC CCC ACG CTC TAT CTA GAA AAA TAC 
GTC 
1874 

Val Lys Ala Leu Met Asp Asn Pro Thr Leu Tyr Leu Glu Lys Tyr 
Val 

290 295 300 



CAT GAG CTG ATT CCA GCT GTG ATG ACC TGC ATC GTG AGC AGA CAG 
TTG 
1922 

His Glu Leu lie Pro Ala Val Met Thr Cys lie Val Ser Arg Gin 
Leu 
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305 310 315 



TGC CTG CGA CCA GAT GTG .GAC AAT CAC TGG GCA CTC CGA GAC TTT 
GCT 
1970 

Cys Leu Arg Pro Asp Val Asp Asn His Trp Ala Leu Arg Asp Phe 
Ala 

320 325 330 



GCC CGC CTG GTG GCC CAG ATC TGC AAG CAT TTT AGC ACA ACC ACT 
AAC 
2018 

Ala Arg Leu Val Ala Gin lie Cys Lys His Phe Ser Thr Thr Thr 
Asn 

335 340 345 

AAC ATC CAG TCC CGG ATC ACC AAG ACC TTC ACC AAG AGC TGG GTG 
GAC 
2066 

Asn lie Gin Ser Arg lie Thr Lys Thr Phe Thr Lys Ser Trp Val 
Asp 

350 355 360 

365 

GAG AAG ACG CCC TGG ACG ACT CGT TAT GGC TCC ATC GCA GGC TTG 
GCT 
2114 

Glu Lys Thr Pro Trp Thr Thr Arg Tyr Gly Ser lie Ala Gly Leu 
Ala 

370 375 380 



GAG CTG GGA CAC GAT GTT ATC AAG ACT CTG ATT CTG CCC CGG CTG 
CAG 
2162 

Glu Leu Gly His Asp Val lie Lys Thr Leu lie Leu Pro Arg Leu 
Gin 

385 390 . 395 



ACC TTC ACC AAG AGC TGG GTG GAC GAG AAG ACG CCC TGG ACG ACT 
CGT 
2210 

Thr Phe Thr Lys Ser Trp Val Asp Glu Lys Thr Pro Trp Thr Thr 
Arg 

400 405 410 



TAT GGC TCC AGG ATT GGA GCA GAC CAT GTG CAG AGC CTC CTG CTG 
AAA 
2258 
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Tyr Gly Ser Arg lie Gly Ala Asp His Val Gin Ser Leu Leu Leu 
Lys 

415 .420 425 



CAC TGT GCT CCT GTT CTG GCA AAG CTG CGC CCA CCG CCT GAC AAT 
CAG 
2306 

His Cys Ala Pro Val Leu Ala Lys Leu Arg Pro Pro Pro Asp Asn 
Gin 

430 435 440 

445 

GAC GCC TAT CGG GCA GAA TTC GGG TCC CTT GGG CCC CTC CTC TGC 
TCC 
2354 

Asp Ala Tyr Arg Ala Glu Phe Gly Ser Leu Gly Pro Leu Leu Cys 
Ser 

450 455 460 



CAG GTG GTC AAG GCT CGG GCC CAG GCT GCT CTG CAG GCT CAG CAG 
GTC 
2402 

Gin Val Val Lys Ala Arg Ala Gin Ala Ala Leu Gin Ala Gin Gin 
Val 

465 470 475 



AAC AGG ACC ACT CTG ACC ATC ACG CAG CCC CGG CCC ACG CTG ACC 
CTC 
2450 

Asn Arg Thr Thr Leu Thr lie Thr Gin Pro Arg Pro Thr Leu Thr 
Leu 

480 485 490 



TCG CAG GCC CCA CAG CCT GGC CCT CGC ACC CCT GGC TTG CTG AAG 
GTT 
2498 

Ser Gin Ala Pro Gin Pro Gly Pro Arg Thr Pro Gly Leu Leu Lys 
Val 

495 500 505 



CCT GGC TCC ATC GCA CTT CCT GTC CAG ACA CTG GTG TCT GCA CGA 
GCG 
2546 

Pro Gly Ser lie Ala Leu Pro Val Gin Thr Leu Val Ser Ala Arg 
Ala 

510 515 520 

525 
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GCT GCC CCA CCA CAG CCT TCC CCT CCT CCC ACC AAG TTT ATT GTA 
ATG 
2594 

Ala Ala Pro Pro Gin Pro.Ser Pro Pro Pro Thr Lys Phe lie Val 
Met 

530 535 540 



TCA TCG TCC TCC AGC GCC CCA TCC ACC CAG CAG GTC CTG TCC CTC 
AGC 
2642 

Ser Ser Ser Ser Ser Ala Pro Ser Thr Gin Gin Val Leu Ser Leu 
Ser 

545 550 555 



ACC TCG GCC CCC GGC TCA GGT TCC ACC ACC ACT TCG CCC GTC ACC 
ACC 
2690 

Thr Ser Ala Pro Gly Ser Gly Ser Thr Thr Thr Ser Pro Val Thr 
Thr 

560 565 570 



ACC GTC CCC AGC GTG CAG CCC ATC GTC AAG TTG GTC TCC ACC GCC 
ACC 
2738 

Thr Val Pro Ser Val Glh Pro lie Val Lys Leu Val Ser Thr Ala 
Thr 

575 580 585 



ACC GCA CCC CCC AGC ACT GCT CCC TCT GGT CCT GGG AGT GTC CAG 
AAG 
2786 

Thr Ala Pro Pro Ser Thr Ala Pro Ser Gly Pro Gly Ser Val Gin 
Lys 

590 595 600 

605 

TAC ATC GTG GTC TCA CTT CCC CCA ACA GGG GAG GGC AAA GGA GGC 
CCC 
2834 

Tyr lie Val Val Ser Leu Pro Pro Thr Gly Glu Gly Lys Gly Gly 
Pro 

610 615 620 



ACC TCC CAT CCT TCT CCA GTT CCT CCC CCG GCA TCG TCC CCG TCC 
CCA 
2882 

Thr Ser His Pro Ser Pro Val Pro Pro Pro Ala Ser Ser Pro Ser 
Pro 
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625 630 635 

CTC AGC GGC ACT CGG GTT TGT GGG GGG .AAG GAG GAG GCT GGG GAC 
AGT 
2930 

Leu Ser Gly Ser Arg Val Cys Gly Gly Lys Gin Glu Ala Gly Asp 
Ser 

640 645 650 



CCC CCT CCA GCT CCA GGG ACT CCA AAA GCC AAT GGC TCC CAG CCC 
AAC 
2978 

Pro Pro Pro Ala Pro Gly Thr Pro Lys Ala Asn Gly Ser Gin Pro 
Asn 

655 660 665 

TGC GGC TCC CCT CAG CCT GCT CCG TGATGCTCCA CCTGCCAGCC 
CCCGGATTCC 
3032 

Cys Gly Ser Pro Gin Pro Ala Pro 
670 675 

CACACATGCA GACATGTACA CACGTGCACG TACACACATG CATGCTCGCT 
AAGCGGAAGG 
3092 

AAGTTGTAGA TTGCTTCCTT CATGTCACTT TCTTTTTAGA TATTGTACAG 
CCAGTTTCTC 
3152 

AGAATAAAAG TTTGGTTTGT AAAAAAAAAA 
3182 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 677 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Ala Glu Glu Lys Lys Leu Lys Leu Ser Asn Thr Val Leu Pro 
Ser 
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10 



15 



Glu Ser Met Lys Val Val Ala Glu Ser Met Gly lie Ala Gin lie 
Gin 

20 25 30 



Glu Glu Thr Cys Gin Leu Leu Thr Asp Glu Val Ser Tyr Arg lie 
Lys 

35 40 45 

Glu lie Ala Gin Asp Ala Leu Lys Phe Met His Met Gly Lys Arg 
Gin 

50 55 60 

Lys Leu Thr Thr Ser Asp He Asp Tyr Ala Leu Lys Leu Lys Asn 
Val 

65 70 75 

80 

Glu Pro Leu Tyr Gly Phe His Ala Gin Asp Phe He Pro Phe Arg 
Phe 

85 90 

95 

Ala Ser Gly Gly Gly Arg Glu Leu Tyr Phe Tyr Glu Glu Lys Glu 
Val 

100 105 110 



Asp Leu Ser Asp He lie Asn Thr Pro Leu Pro Arg Val Pro Leu 
Asp 

115 120 125 



Val Cys Leu Lys Ala His Trp Leu Ser He Glu Gly Cys Gin Pro 
Ala 

130 135 140 



He Pro Glu Asn Pro Pro Pro Ala Pro Lys Glu Gin Gin Lys Ala 
Glu 

145 150 155 

160 

Ala Thr Glu Pro Leu Lys Ser Ala Lys Pro Gly Gin Glu Glu Asp 
Gly 

165 170 175 
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Pro Leu Lys Gly Lys Gly Gin Gly Ala Thr Thr Ala Asp Gly Lys 
Gly 

180 185 190 



Lys Glu Lys Lys Ala Pro Pro Leu Leu Glu Gly Ala Pro Leu Arg 
Leu 

195 200 205 



Lys Pro Arg Ser lie His Glu Leu Ser Val Glu Gin Gin Leu Tyr 
Tyr 

210 215 220 



Lys Glu lie Thr Glu Ala Cys Val Gly Ser Cys Glu Ala Lys Arg 
Ala 

225 230 235 

240 

Glu Ala Leu Gin Ser He Ala Thr Asp Pro Gly Leu Tyr Gin Met 
Leu 

245 250 255 



Pro Arg Phe Ser Thr Phe He Ser Glu Gly Val Arg Val Asn Val 
Val 

260 265 270 



Gin Asn Asn Leu Ala Leu Leu He Tyr Leu Met Arg Met Val Lys 
Ala 

275 280 285 



Leu Met Asp Asn Pro Thr Leu Tyr Leu Glu Lys Tyr Val His Glu 
Leu 

290 295 300 



He Pro Ala Val Met Thr Cys He Val Ser Arg Gin Leu Cys Leu 
Arg 

305 310 315 

320 

Pro Asp Val Asp Asn His Trp Ala Leu Arg Asp Phe Ala Ala Arg 
Leu 

325 330 335 



Val Ala Gin He Cys Lys His Phe Ser Thr Thr Thr Asn Asn He 
Gin 

340 345 350 
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Ser Arg lie Thr Lys Thr Phe Thr Lys Ser Trp Val Asp Glu Lys 
Thr 

355 360 365 



Pro Trp Thr Thr Arg Tyr Gly Ser lie Ala Gly Leu Ala Glu Leu 
Gly 

370 375 380 



His Asp Val lie Lys Thr Leu lie Leu Pro Arg Leu Gin Thr Phe 
Thr 

385 390 395 

400 

Lys Ser Trp Val Asp Glu Lys Thr Pro Trp Thr Thr Arg Tyr Gly 
Ser 

405 410 415 



Arg lie Gly Ala Asp His Val Gin Ser Leu Leu Leu Lys His Cys 
Ala 

420 425 430 



Pro Val Leu Ala Lys Leu Arg Pro Pro Pro Asp Asn Gin Asp Ala 
Tyr 

435 440 445 



Arg Ala Glu Phe Gly Ser Leu Gly Pro Leu Leu Cys Ser Gin Val 
Val 

450 455 460 



Lys Ala Arg Ala Gin Ala Ala Leu Gin Ala Gin Gin Val Asn Arg 
Thr 

465 470 475 

480 

Thr Leu Thr lie Thr Gin Pro Arg Pro Thr Leu Thr Leu Ser Gin 
Ala 

485 490 495 



Pro Gin Pro Gly Pro Arg Thr Pro Gly Leu Leu Lys Val Pro Gly 
Ser 

500 505 510 



He Ala Leu Pro Val Gin Thr Leu Val Ser Ala Arg Ala Ala Ala 
Pro 

515 520 525 
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Pro Gin Pro Ser Pro Pro Pro Thr Lys Phe lie Val Met Ser Ser 
Ser 

530 535 540 



Ser Ser Ala Pro Ser Thr Gin Gin Val Leu Ser Leu Ser Thr Ser 
Ala 

545 550 555 

560 

Pro Gly Ser Gly Ser Thr Thr Thr Ser Pro Val Thr Thr Thr Val 
Pro 

565 570 575 



Ser Val Gin Pro He Val Lys Leu Val Ser Thr Ala Thr Thr Ala 
Pro 

580 585 590 



Pro Ser Thr Ala Pro Ser Gly Pro Gly Ser Val Gin Lys Tyr He 
Val 

595 600 605 



Val Ser Leu Pro Pro Thr Gly Glu Gly Lys Gly Gly Pro Thr Ser 
His 

610 615 620 



Pro Ser Pro Val Pro Pro Pro Ala Ser Ser Pro Ser Pro Leu Ser 
Gly 

625 630 635 

640 

Ser Arg Val Cys Gly Gly Lys Gin Glu Ala Gly Asp Ser Pro Pro 
Pro 

645 650 655 



Ala Pro Gly Thr Pro Lys Ala Asn Gly Ser Gin Pro Asn Cys Gly 
Ser 

660 665 670 



Pro Gin Pro Ala Pro 
675 
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dTAFII250 amino acid sequence 

MGPGCDLLLRTAATTTAAAIMSDTPSDEDSAGGGPFSLAGFLFGNINGAGQLEGESV 

IX)DECKKJILAGLGALGLGSUTELTANEELTGTOGALVM)EGWVRSTEDAVDYSDIN 

EVAEDESRRYQQTMGSLQPLCHSDYDEDDYDAIXEDroCKLM»PPPPPPGPMKKDKD 

QDSITGEK\TDFSSSSDSESEMGPQEATpAESEDGKLTLPUkGIMQHDATmPSVTCL 

FPEFRPGKVLRFIJIIJ?GPGKNWS\^SARRKRKKKHRELIQEEQIQEVECS\^SEV 

QKSLWNYDYAPPPPPEQCLSDDEITMMAPVESKFSQSTGDroKXTDTKPRVAEW 

GPARLWYDMIXjVPEIXjSGFDYGFKUUCTEHEPVIKSRMIEEFRKL^^ 

NFlJ^mX3LHWEDDIIWDGEDVKHKGTKPQRASlJ^GWU>SSM^^ 

AATlJ)DDKPWYSIFProNEDLVYGRWEDNIIWDAQAMPRIXEPPVLTLDPNDENLI 

I£IPDEKEEATSNSPSKESKKESSLKKSRIIXGKTGVIKEEP(X?NMSQPENiaDPWNl^N 

DEYYYPKQQGIJ^GTFGGMQHSPA\nSLRQPFFPTHMGPIKlJlQFHRPPLKK^ 

1^QPGPHSVQPIIJCHIKKKAKMREQERQASGGGEMFFMRTPQDLTCKIX}D 

EENGPLMMQVGNUTiaKNYYKRKPGKDPGAPDCXYGErVYanSP 

QAFE^MJ^lAPIYIJ^KMPETDI=LImTRQGYYIRELVDIFWGQQCPL^ 

ANTHIRDFLQVHYlUJ^SKDRPRRmMEDIKKJiJ=PSHSESSIRKRlJ^ 

MDSNWWVLKSDFlUJriEEHRAMVSPEQCCAYYSMIAAEQRLKDAGYGEKSFFAPE 

EENEEDFQMKIDDEVRTAPWNTTRAHAAMKGKa-lX\m3VADPTGC^^^ 

PNKPTQQKDDKEPQP\TaCTVTGTOADLRRl^LKNAKQIXRKFG\^^ 

mVWTMSTEQARSGEGPMSKFARGSRFSVAEHQERYKEECQRIFDLQNKVLSSTEVL 

STOTDSSSAEDSDFEEMGKNIENMU3NKKTSSQLSREREEQERKELQRMU^ 

GNN HRDDD TASVTSIJ^SSATGRQJaYRTTRDEEGKEYWCErVRK^ 

TIKDEEFIRKFAIJDEQHREEMRKERRRIQEQLRRUaWQEKEKLKGPPEKKPKK\^ 

PDIJaKCGACGAIGHMRTNKFCPLYYQTNAPPSM>VAMTEEQEEEl£KTVIH^ 

LnCVEGTKIVlX3KQIJESADE\TUlKSL\l-KIT>KQQU>PKKKRR 

SIHRRRTDP^WTLSSIIJESII^roMRDLPNTYPFHTPVNAKV\^DYYKIITRPMDLQT 

IJIENVRKRLYPSREEFREHLE1JVKNSATYNGPKHSLTQISQSMLDL(^ 

ARl^KAINPIiDDDDQVAFSFILDhmTQKMMAVPDSWPFHHPVNKKFWDY^ 

IVhO'MDLETIRKMSKHKYQSRESFlJDDVNLllANSVKYNGPESQYTO 

QlLWDEHLTQLEKDICTAKEAAlJEEAEL£SLDPMTPGPYTPQPPDLYmT^ 

RDASVFQDESNMSVUDIPSATPEKQVapEGEDGDGDlJUDEEEGTVQQPQASVLYEDL 

U^EGEDDEEDAGSDEEGDNPFSAJQI^GSDSDVGSGGIRPKQPRMLQENTTIMDME 

NEESMMSYEGIXK3EASHGLEDSMSYGSYEEPDPKSNTQDTSFSSIGGYEVSEEEEDEEE 
EEQRSGPSVLSQVHLSEDEEDSEDFHSIAGDSDLDSDE 
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Sequence Range: 1 to 2214 

5 10 15 20 25 30 35 40 45 
* ★ * * 

AGA GGT GGT GCA GGC GGC GGC CCC GGC GGC GCA GAC CCT GGC GCC AGC 
Arg Gly Gly Ala Gly Gly Ala Pro Gly Gly Ala Asp Pro Gly Ala Ser> 
a a ■ a ^TRANSLATION OF HTAF130 DNA [A]_a a a a > 

50 55 60 65 70 75 80 85 90 95 

* ★ * - * * 

GGC CCG GCC AGC ACG GCG GCC AGC ATG GTC ATC GGG CCA ACT ATG CAA 
Gly Pro Ala Ser Thr Ala Ala Ser Met Val He Gly Pro Thr Met Gln> 
a a a TRANSLATION OF HTAF130 DNA [A]_a__a__a a > 

100 105 110 115 120 125 130 135 140 
* * ★ * * 

GGG CGC TGC CCA GCC CGG CCG CCG TCC CGC CGC CCG CCC CCG GGA CCC 
Gly Arg Cys Pro Ala Arg Pro Pro Ser Arg Arg Pro Pro Pro Gly Pro> 
a a a T RANSLATION OF HTAF130 DNA [A)_a__a_a__a__> 

145 150 155 160 165 170 175 180 185 190 
* * * ★ * 

CCA CCG GGC TGC CCA AAA GGC GCG GCC GGC GCA GTG ACC CAG AGC CTG 
Pro Pro Gly Cys Pro Lys Gly Ala Ala Gly Ala Val Thr Gin Ser Leu> 
a a a T RANSLATION OF HTAF130 DNA [A]_a ^a ^a_a_> 

195 200 205 210 215 220 225 230 235 .240 

* ★ * w * 

TCC CGG ACG CCC ACG GCC ACC ACC AGC GGG ATT CGG GCC ACC CTG ACG 
Ser Arg Thr Pro Thr Ala Thr Thr Ser Gly He Arg Ala Thr Leu Thr> 
a a a T RANSLATION OF HTAF130 DNA fA],_a a__ a a > 

245 250 255 260 265 270 275 280 285 
* * * • * * 

CCC ACC GTG CTG GCC CCC CGC TTG CCG CAG CCG CCT CAG AAC CCG ACC 
Pro Thr Val Leu Ala Pro Arg Leu Pro Gin Pro Pro Gin Asn Pro Thr> 
a a a_^_TRANSLATiaN OF HTAF130 DNA (A].a_a_a ^a:^ > 

290 295 300 305 310 315 320 325 330 335 

* ★ ♦ * * 

AAC ATC CAG AAC TTC CAG CTG CCC CCA GGA. ATG GTC CTC GTC CGA AGT 
Asn He Gin Asn Phe Gin Leu Pro Pro Gly Met Val Leu Val Arg Ser> 
a a a T RANSLATION OF HTAF130 DNA fAI a a a a > 

340 345 350 355 360 365 370 375 380 
« * * * * 

GAG AAT GGG CAG TTG 1TA ATC ATT CCT CAG CAG GCC TTC GCC CAG ATC 
Glu Asn Gly Gin Leu Leu Met He Pro Gin Gin Ala Leu Ala Gin Met> 
a a a T RANSLATION OF HTAF130 DNA [A]_a_a ^a_a^> 

385 390 395 400 405 410 415 420 425 430 
-k ★ ♦ * * 

CAG GCG CAG GCC CAT GCC CAG CCT CAG ACC ACC ATC GCG CCT CGC CCT 
Gin Ala Gin Ala His Ala Gin Pro Gin Thr Thr Met Ala Pro Arg Pro 
a a a T RANSLATION OF HTAF130 DNA [A]_a_a ^a ^a > 

435 440 ' 445 450 455 460 465 470 475 480 

♦ * * ★ * 

GCC ACC CCC ACA AGT GCC CCT CCC GTC CAG ATC TCC ACC GTA CAG GCA 
Ala Thr Pro Thr Ser Ala Pro Pro Val Gin He Ser Thr Val Gin Ala> 
a a a ^TRANSLATION OF HTAF130 DNA [A]_a ^a a a > 
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485 490 495 500 505 510 515 520 525 
* * * * 

CCT GGA ACA CCT ATC ATT GCA CGG CAG GTG ACC CCA ACT ACC ATA ATT 
Pro Gly Thr Pro lie lie Ala Arg Gin Val Thr Pro Thr Thr lie Ile> 
a a a__TRANSLATi0N OF HTAF130 DNA [A]_a a a a > 

530 535 540 545 550 555 560 565 570 575 
* * * * * 

AAG CAA GTG TCT CAG GCC CAG ACA ACG GTG CAG CCC ACT GCA ACC CTC 
Lys Gin Val Ser Gin Ala Gin Thr Thr Val Gin Pro Ser Ala Thr Leu> 
^a a a^_TRANSLATION OF HTAF130 DNA fA] a a a a > 

580 585 590 595 600 605 610 615 620 
* * « * * . 

CAG CGC TCG CCC GGC GTC CAG CCT CAG CTC GTT CTG GGT GGC GCT GCC 
Gin Arg Ser Pro Gly Val Gin Pro Gin Leu Val Leu Gly Gly Ala Ala> 
_a_a__a__TRANSLATION OF HTAF130 DNA [A]__a____,a a a > 

625 630 635 640 645 650 655 660 665 670 

* . * * it ^ 

CAG ACG GCT TCA CTT GGG ACG GCG ACG GCT GTT CAG ACG GGG ACT CCT 
Gin Thr Ala Ser Leu Gly Thr Ala Thr Ala Val Gin Thr Gly Thr Pro> 
a a a T RANSLATION OF HTAF130 DNA [A]_a ^a_ a a > 

675 680 685 690 695 700 705 710 715 720 

H * it * it 

CAG CGC ACG GTA CCA GGG GCG ACC ACC ACT TCC TCA GCT GCC ACG GAA 
Gin Arg Thr Val Pro Gly Ala Thr Thr Thr Ser Ser Ala Ala Thr Glu> 
a ^a_a__TRANSLATION OF HTAF130 DNA fA1 a a a a > 

725 730 735 740 745 750 755 760 765 
* * * * 

ACT ATG GAA AAC GTG AAG AAA TGT AAA AAT TTC CTA TCT ACG TTA ATA 
Thr Met Glu Asn Val Lys Lys Cys Lys Asn Phe Leu Ser Thr Leu Ile> 
a a a ^TRANSLATION OF HTAF130 DNA [A]_a a a a > 

770 775 780 785 790 795 800 805 810 815 
* ★ * * if 

AAA CTG GCT TCA TCT GGC AAG CAG TCT ACA GAG ACA GCA GCT AAT GIQ 
Lys Leu Ala Ser Ser Gly Lys Gin Ser Thr Glu Thr Ala Ala Asn Val> 
_a_a ^a__TRANSLATION OF HTAF130 DNA [A]_a a a a >• 

820 825 830 835 840 845 850 855 860 
★ * ★ * * 

AAA GAG CTC GTG CAG AAT TTA CTG GAT GGA AAA ATA GAA GCA GAA GAT 
Lys Glu Leu Val Gin Asn Leu Leu Asp Gly Lys lie Glu Ala Glu Asp> 
a a a TRANSLATION OF HTAF130 DNA fAI a a a a ^ 

865 870 875 880 885 890 895 900 905 910 
* * * * * 

TTC ACA AGC AGG TTA TAC CGA GAA CTT AAT TCT TCA OCT CAA CCT TAC 
Phe Thr Ser Arg Leu lyr Arg Glu Leu Asn Ser Ser Pro Gin Pro iyr> 
_a_a_a__TRANSLATION OF HTAF130 DNA fAI a a a a > 

915 920 925 930 935 940 945 950 955 960 
* * * * ^ 

CTT GTG CCT TTC CTG AAG AGG AGC TTA CCC GCC TTG AGA CAG CTC ACC 
Leu Val Pro Phe Leu Lys Arg Ser Leu Pro Ala Leu Arg Gin Leu Thr> 
a_a_a__TRANSLATION OF HTAF130 DNA [A)_a_a_a__a_> 

965 970 975 980 985 990 995 1000 1005 
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CCC GAC TCC GCG GCC TTC ATC CAG GAG AGC GAG GAG GAG CCG GCA CCG 
Pro Asp Ser Ala Ala Phe He Gin Gin Ser Gin Gin Gin Pro Pro Pro> 
a a a TRANSLATION OF HTAF130 DNA (A]_a a a a > 

1010 1015 1020 1025 ' 1030 1035 1040 1045 1050 1055 
* * * - * * 

CCC ACC TCG CAG GCC ACC ACT GCG CTC ACG GCC GTG GTG CTG AGT AGC 
Pro Thr Ser Gin Ala Thr Thr Ala Leu Thr Ala Val Val Leu Ser Ser> 
a a a__TRANSLATION OF HTAF130 DNA [A]_a a ^a > 

1060 1065 1070 1075 1080 1085 1090 1095 1100 
* * ♦ * * 

TCG GTC CAG CGC ACG GCC GGG AAG ACG GCG GCC ACC GTG ACC AGT GCC 
Ser Val Gin Arg Thr Ala Gly Lys Thr Ala Ala Thr Val Thr Ser Ala> 
a a a T RANSLATION OF HTAF130 DNA [A]. a a a a > 

1105 1110 1115. 1120 1125 1130 1135 1140 1145 1150 
* * * . * * . 

CTC CAG CCC CCT GTG CTC AGC CTC ACG CAG CCC ACG CAG GTC GGC GTC 
Leu Gin Pro Pro Val Leu Ser Leu Thr Gin Pro Thr Gin Val Gly Val> 
a ^a_a__TRANSLATION OF HTAF130 DNA [A]_a a ^a ^a > 

1155 1160 1165 1170 1175 1180 1185 1190 1195 1200 



* 



GGC AAG CAG GGG CAA CCC ACA CCG CTG GTC ATC CAG CAG CCT CCG AAG 
Gly Lys Gin Gly Gin Pro Thr Pro Leu Val He Gin Gin Pro Pro Lys> 
^a a_a__TRANSLATIQN OF HTAF130 DNA [Al_a a a _^ ^ 

1205 1210 1215 1220 1225 1230 1235 1240 1245 
* * * * 

CCA GGA GCC CTG ATC CGG CCC CCG CAG GTG ACG TTG ACG CAG ACA CCC 
Pro Gly Ala Leu He Arg Pro Pro Gin Val Thr Leu Thr Gin Thr Pro> 
a__a_a__TRANSLATION OF HTAF130 DNA fAl a a a a > 

1250 1255 1260 1265 1270 1275 1280 1285 1290 1295 
* * * * * 

ATG GTC GCC CTG CGG CAG CCT CAC AAC CGG ATC ATG CTC ACC ACG CCT 
Met Val Ala Leu Arg Gin Pro His Asn Arg He Met Leu Thr Thr Pro> 
^a ^a a T RANSLATION OF HTAF130 DNA [A]. a a a a 

1300 1305 1310 1315 1320 1325 1330 1335 1340 
* * * * * 

CAG CAG ATC CAG CTC AAC CCA CTC CAG CCA GTC CCT GTC GTC AAA CCC 
Gin Gin He Gin Leu Asn Pro Leu Gin Pro Val Pro Val Val Lys Pro> 
_a__a_a__TRANSLATION OF HTAF130 DNA fAI a a a a > 

1345 1350 1355 1360 1365 1370 1375 1380 1385 1390 
* . * * * 
GCC GTC TTA CCT GGA ACC AAA GCC CTT TCT GCT GTC TCG GCA CAA GCA 
Ala Val Leu Pro Gly Thr Lys Ala Leu Ser Ala Val Ser Ala Gin Ala> 
^a a a T RANSLATION OF HTAF130 DNA fAI a a a a ^ 

1395 1400 1405 1410 1415 1420 1425 1430 1435 1440 
* * ' * ♦ * 
GCT GCT GCA CAG AAA AAT AAA CTC AAG GAG CCT GGG GGA GGT TCG TIT 
Ala Ala Ala Gin Lys Asn Lys Leu Lys Glu Pro Gly Gly Gly Ser Phe> 
a a a T RANSLATION OF HTAF130 DNA [A] .a a a__a > 

1445 1450 1455 1460 1465 1470 1475 1480 1485 

CGG aAC GAT GAT GAC ATT AAT GAT GTT GCA TCG ATC GCT GGA GTA AAC 
Arg Asp Asp Asp Asp He A.sn Asp Val Ala Ser Met Ala Gly Val Asn> 
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a a a TRA1>ISLATI0N OF HTAF130 DNA (A]_a a a a _> 

1490 1495 1500 1505 1510 1515 " 1520 1525 1530 1535 
* ★ * * * 

TTG TCA GAA GAA AGT GCA AGA ATA TTA GCC ACG AAC TCT GAA TTG GTG 
Leu Ser Glu Glu Ser Ala Arg He Leu Ala Thr Asn Ser Glu Leu Val> 
a a a TRANSLATION OF HTAF130 DNA [A]_a a a a > 

1540 1545 1550 1555 1560 1565 1570 1575 1580 
* * * * it 

GGC ACG CTA ACG CGG TCC TGT AAA GAT GAA ACC TTC CTC CTC CAA GCG 
Gly Thr Leu Thr Arg Ser Cys Lys Asp Glu Thr Phe Leu Leu Gin Ala> 
a ^a a TRANSLATION OF HTAF130 DNA [A], a a a a > 

1585 1590 1595 1600 1605 1610 1615 1620 1625 1630 
* * * * « 

CCT TTG CAG AGA AGA ATA TTA GAA ATA GGT AAA AAA CAT GGT ATA ACG 
Pro Leu Gin Arg Arg He Leu Glu He Gly Lys Lys His Gly He Thr> 
a_a_a__TRANSLATIQN OF HTAF130 DNA [A1 a a a a > 

1635 1640 1645 1650 1655 1660 ' 1665 1670 1675 1680 
* ★ * . * * 

GAA TTA CAT CCA GAT GTA GTA AGT TAT GTA TCA CAT GCC ACG CAA CAA 
Glu Leu His Pro Asp Val Val Ser lyr Val Ser His Ala Thr Gin Gln> 
a a a TRANSLATION OF HTAF130 DNA fAl a a a a > 

1685 1690 1695 1700 1705 1710 1715 1720 1725 
* ♦ ★ * 

AGG CTA CAG AAT CTT GTA GAG AAA ATA TCA GAA ACA GCT CAG CAG AAG 
Arg Leu Gin Asn Leu Val Glu Lys He Ser Glu Thr Ala Gin Gin Lys> 
a a a TRANSLATION OF HTAF130 DNA FAl a a a a > 

1730 1735 1740 1745 1750 1755 1760 1765 1770 1775 
♦ * ★ * * - . 

AAC TTT TCT TAC AAG GAT GAC GAC AGA TAT GAG CAG GCG AGT GAC GTC 
Asn Phe Ser Tyr Lys Asp Asp Asp Arg Tyr Glu Gin Ala Ser Asp Val> 
a a a T RANSLATION OF HTAF130 DNA fAl a a a a > 

1780 1785 1790 1795 1800 1805 1810 1815 1820 
* * * * * 

CGG GCA CAG CTC AAG TTT TTT GAA CAG CTT GAT CAA ATC GAA AAG CAG 
Arg Ala Gin Leu Lys Phe Phe Glu Gin Leu Asp Gin He Glu Lys Gln> 
a a a TRANSLATION OF HTAF130 DNA fAl a a a a > 

1825 1830 1835 1840 1845 1850 1855 1860 1865 1870 
★ * * * It . 

AGG AAG GAT GAG CAG GAG CGG GAG ATC CTG ATG AGG GCA GCA AAG TCT 
Arg Lys Asp Glu Gin Glu Arg Glu He Leu Met Arg Ala Ala Lys Ser> 
a a a T RANSLATION OF HTAF130 DNA [A]_a a a a > 

1875 1880 1885 1890 1895 1900 1905 1910 1915 1920 
★ * * * * 

CGG TCA AGA CAA GAA GAT CCA GAA CAG TTA AGG CTG AAA CAG AAG GCA 
Arg Ser Arg Gin Glu Asp Pro Glu Gin Leu Arg Leu Lys Gin Lys Ala> 
^a a a T RANSLATION OF HTAF130 DNA [A]_a a a > • 

1925 1930 1935 1940 1945 1950 1955 1960 1965 
★ - * * * 

AAG GAG ATG CAG CAA CAG GAA CTG GCA CAA ATG AGA CAG CGG GAC GCC 
Lys Glu Met Gin Gin Gin Glu Leu Ala Gin Met Arg Gin Arg Asd Ala> 
a a a T RANSLATION OF HTA?130 DNA [A]_a_a_a la > 
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1970 1975 1980 1985 1990 1995 2000 .2005 2010 2015 
* * * ★ * 

MC CTC ACA GCA CTA GCA GCG ATC GGG CCC AGG AAA AAG AGG AAA GTG 
Asn Leu Thr Ala Leu Ala Ala He Gly Pro Arg Lys Lys Arg Lys Val> 
a a a__TRANSLATION OF HTAF130 DNA fAl a a a ^a > 

2020 2025 2030 2035 2040 2045 2050 2055 2060 
* * * * * 

GAG TGT CCG GGG GCG GGG TCA GGA GCA GAG GGG TCG GGC CCC GGC TCA 
Asp Cys Pro Gly Pro Gly Ser Gly Ala Glu Gly Ser Gly Pro Gly Ser> 
_a_a_a__TRANSLATION OF HTAF130 DNA [A]_a a a a _^ 

2065 2070 2075 2080 2085 2090 . 2095 2100 2105 2110 
* * * * ♦ 

GTG GTC CCA GGC AGC TCG GGT GTC GGA ACC CCC AGA CAG TTC ACQ CGA 
Val Val Pro Gly Ser Ser Gly Val Gly Thr Pro Arg Gin Phe Thr Arg> 
a a a T RANSLATION OF HTAF130 DNA FAl a a a a > 

2115 2120 2125 2130 2135 2140 2145 2150 2155 2160 
* * * * ^ 

CAA AGA ATC ACG CGG GTC AAC CTC AGG GAC CTC ATA TTT TGT TTA GAA 
Gin Arg He Thr Arg Val Asn Leu Arg Asp Leu He Phe Cys Leu Glu> 
a ^a a__TRANSLATION OF HTAF130 DNA [A]_a a a > 

2165 2170 2175 2180 2185 2190 2195 2200 2205 
* * * * 

AAT GAA CGT GAG ACA AGC CAT TCA CTG CTG CTC TAC AAA GCA TTC CTT 
Asn Glu Arg Glu Thr Ser His Ser Leu Leu Leu.iyr Lys Ala Phe Leu> 
a a a T RANSLATION OF HTAF130 DNA fAl a a a a ^ 

2210 

AAG TGA 
Lys ***> 

a > 
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5 10 15 • 20 25 30 35 40 45 50 
***** 
RGGAGGAPGG ADPGASGPAS TAASMVIGPT MQGRCPARPP SRRPPPGPPP 

55 60 65 70 75 80 85 90 95 100 

* * * * * 
GCPKGAAGAV TQSLSRTPTA TTSGIRATLT PTVLAPRLPQ PPQNPTNIQN 

105 lie 115 120 125 130 135 140 145 150 

* * * * * 
FQLPPGMVLV RSENGQLLMI PQQALAQMQA QAHAQPQTTO APRPATPTSA 

155 160 165 170 175 180 185 190 195 200 
***** 
PPVOISrVQA PGTPIIARQV TPTTIIKQVS QAQTTVQPSA TLQRSPGVQP 

205 210 215 220 225 230 235 240 245 250 

* * * * * 

QLVLGGAAQT ASLGTATAVQ TGTPORTVPG ATTTSSAATE 1MENVKKCKN 

255 260 265 270 275 280 285 290 295 300 

* * * ♦ * 

FLSTLIKLAS SGKQSTETAA NVKELVQNLL DGKIEAEDFT SRLYRELNSS 

305 310 315 320 325 330 335 340 345 350 

* * • * * * 
PQPYLVPFLK RSLPALRQLT PDSAAFIQQS QQQPPPPTSQ ATTALTAWL 

355 360 365 370 375 380 385 390 395 400 

* *^ * ♦ * 
SSSVQRTAGK TAATVTSALQ PPVLSLTQPT QVGVGKQGQP TPLVIQQPPK 

405 410 415 420 425 430 435 440 445 450 
***** 
PGALIRPPQV TLTQTPMVAL RQPHNRIMLT TPQQIQLNPL QPVPWKPAV 

455 460 465 470 475 480 485 490 495 500 
***** 
LPGTKALSAV SAQAAAAQKN KLKEPGGGSF RDDDDINDVA SMAGVNLSEE 

505 510 515 520 525 530 535 540 545 550 

* * . * * 

SARILATNSE LVGTLTRSCK DETFLLQAPL QRRILEIGKK HGITELHPDV 

555 560 565 570 575 580 585 590 595 600 

* * * * * 
VSYVSHATQQ RLQNLVEKIS ETAQQKNFSY KDDDRYBQAS DVRAQLKFFE 

605 610 615 620 625 630 635 640 645 650 

* * * * * 
QLDQIEKQRK DEQEREILMR AAKSRSRQED PEQLRLKQKA KEMQQQELAQ 

655 660 665 670 675 680 685 690 695 700 

* * * . * * 

MRQRDANLTA LAAIGPRKKR KVDCPGPGSG ABGSGPGSW PGSSGVGTPR 

705 710 715 720 725 730 735 

* * * 

QFTRQRITRV NLRDLIFCLE NERETSHSLL LYKAFLK* 
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Sequence Range: 1 to 2152 

.5 10 15 20 25 30 35 40 45 
* * * * 

' CTA CTG GCC GTG CTG CAG TTC CTA CGG CAG AGC AAA CTC CGC GAG GCC 
Leu Leu Ala Val Leu Gin Phe Leu Arg Gin Ser Lys Leu Arg Glu Ala> 
a.. a_a__TRANSLATION OF HTAFIOO DNA (A]_a a a a > 

50 55 60 65 70 75 80 85 90 95 
* * ♦ . * * 

GAA GAG GCG CTG CGC CGT GAG GCC GGG CTG CTG GAG GAG GCA GTG GCG 
Glu Glu Ala Leu Arg Arg Glu Ala Gly Leu Leu Glu Glu Ala Val Ala> 
a a a T RANSLATION OF HTAFIOO DNA fAI a a a > 

100 105 110 115 120 125 130 135 140 
* * * * ★ 

GGC TCC GGA GCC CCG GGA GAG GTG GAC AGC GCC GGC GCT GAG GTG ACC 
Gly Ser Gly Ala Pro Gly Glu Val Asp Ser Ala Gly Ala Glu Val Thr> 
a a a T RANSLATION OF HTAFIOO DNA (A)_a a a a > 

145 150 155 160 165 170 175 180 185 190 
* * * * * 

AGC GCG CTT CTC AGC CGG GTG ACC GCC TCG GCC CCT GGC CCT GCG GCC 
Ser Ala Leu Leu Ser Arg Val Thr Ala Ser Ala Pro Gly Pro Ala Ala> 
a a a TRANSLATION OF HTAFIOO DNA [Al a a a a > 

195 200 205 210 215 220 225. 230 235 240 
* * * ★ ★ 

CCC GAC CCT CCG GGC ACT GGC GCT TCG GGG GCC ACG GTC GTC TCA GOT 
Pro Asp Pro Pro Gly Thr Gly Ala Ser Gly Ala Thr Val Val Ser Gly> 
a a a TRANSLATION OF HTAFIOO DNA [A]_a_a _ a a > 

245 250 255 260 265 270 275 280 285 
* * * * 

TCA GCC TCA GGT CCT GCG GCT CCG GGT AAA GTT GGA AGT GTT GCT GTG 
Ser Ala Ser Gly Pro Ala Ala Pro Gly Lys Val Gly Ser Val Ala Val> 
' a a a T RANSLATION OF HTAFIOO DNA FAI a a a a > 

290 295 300 305 310 315 320 325 330 335 
* * * * * 

GAA GAC CAG CCA GAT GTC AGT GCC GTG TTG TCA GCC TAC AAC CAA CAA 
Glu Asp Gin Pro Asp Val Ser Ala Val Leu Ser Ala Tyr Asn Gin Gln> 
a a a T RANSLATION OF HTAFIOO DNA [A)_a a a a > 

340 345 350 355 360 365 370 375 380 
★ * * * * 

GGA GAT CCC ACA ATG TAT GAA GAA TAC TAT AGT GGA CTG AAA CAC TTC 
Gly Asp Pro Thr Met Tyr Glu Glu Tyr Tyr Ser Gly Leu Lys His Phe> 
a a a TRANSLATION OF HTAFIOO DNA FAI, a _ a a a > 

385 390 395 400 405 410 415 420 425 430 
* ★ * * « 

ATT GAA TGT TCC CTG GAC TGC CAT CGG GCA GAG TTG TCC CAA CTT TTT 
lie Glu Cys Ser Leu Asp Cys His Arg Ala Glu Leu Ser Gin Leu Phe> 
a a a T RANSLATION OF HTAFIOO DNA fAI a a a a > 

435 440 445 450 455 460 465 470 475 480 

* * it * ^ 

TAT CCT CTG TTT GTG CAC ATG TAC TTG GAG CTA GTC TAC AAT CAA CAT 
Tyr Pro Leu Phe Val His Met Tyr Leu Glu Leu Val Tyr Asn Gin His> 
^a a a T RANSLATION OF HTAFIOO DNA (Al_a a a a > 
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485 490 495 500 505 510 515 520 525 
* * * * 

GAG AAT GAA GCA AAG TCA TTC TTT GAG AAG TTC CAT GGA GAT GAG GAA 
Giu Asn Glu Ala Lys Set Phe Phe Glu Lys Phe His Gly Asp Gin Glu> 
a a a TRANSLATION OF HTAFIOO DNA [A]_a a a a > 

530 535 540 545 550 555 560 565 570 575 
* * * * ★ 

TGT TAT TAG GAG GAT GAG CTA GGA GTA TTA TCT AGT CTT ACC AAA AAG 
Cys Tyr Tyr Gin Asp Asp Leu Arg Val Leu Ser Ser Leu Thr Lys Lys> 
a a a TRANSLATION OF HTAFIOO DNA fAI a a a a > 

580 585 590 595 600 605 610 615 620 
* * * * * 

GAA CAC ATG AAA GGG AAT GAG ACC ATG TTG GAT TTT CGA ACA AGT AAA 
. Glu His Met Lys Gly Asn Glu Thr Met Leu Asp Phe Arg Thr Ser Lys> 
a a a T RANSLATION OF HTAFIOO DNA fAI a a a a > 

625 630 635 640 645 650 655 660 665 670 
* * * ★ . * 

TTT .GTT CTG CGT ATT TCC CGT GAC TCG TAC CAA CTC TTG AAG AGG CAT 
Phe Val Leu Arg He Ser Arg Asp Ser Tyr Gin Leu Leu Lys Arg His> 
a a^a__TRANSLATION OF HTAFIOO DNA (A)_a a a a > 

675 680 685 690 695 700 705 710 715 720 
* * * ♦ * 

CTT CAG GAG AAA GAG AAC AAT CAG ATA TGG AAC ATA GTT CAG GAG CAC 
Leu Gin Glu Lys Gin Asn Asn Gin He Trp Asn He Val Gin Glu His> 
a a a T RANSLATION OF HTAFIOO DNA fAl a a a a > 

725 730 735 740 745 750 755 760 765 
* * ★ * 

CTC TAC ATT GAC ATC TTT GAT GGG ATG CCG CGT AGT AAG CAA CAG ATA 
Leu Tyr He Asp He Phe Asp Gly Met Pro Arg Ser Lys Gin Gin Ile> 
a a a T RANSLATION OF HTAFIOO DNA fAl a a a a > 

770 775 780 785 790, 795 800 805 810 815 
* * * * ★ 

GAT GGG ATG GTG GGA AGT TTG GCA GGA GAG GCT AAA CGA GAG GCA AAC 
Asp Ala Met Val Gly Ser Leu Ala Gly Glu Ala Lys Arg Glu Ala Asn> 
a a a T RANSLATION OF HTAFIOO DNA fAl a a a a > 

820 825 830 835 840 845 850 855 860 
* * * * ♦ 

AAA TCA AAG GTA TTT TTT GGT TTA TTA AAA GAA CCA GAA ATT GAG GTA 
Lys Ser Lys Val Phe Phe Gly Leu Leu Lys Glu Pro Glu He Glu Val> 
a a a T RANSLATION OF HTAFIOO DNA fAI a a a a > 

865 870 875 880 885 890 895 900 905 910 
* * * * ♦ 

CGT TTG GAT GAC GAG GAT GAA GAG GGA GAA AAT GAA GAA GGA AAA CCT 
Pro Leu Asp Asp Glu Asp Glu Glu Gly Glu Asn Glu Glu Gly Lys Pro 
_a_a_a__TRANSLATiaN OF HTAFIOO DNA fAI a a a a > 

915 920 925 930 935 940 945 950 955 960 
. * . * * * * 
AAA AAG AAG AAG CCT AAA AAA GAT AGT ATT GGA TCC AAA AGC AAA AAA 
Lys Lys Lys Lys Pro Lys Lys Asp Ser He Gly Ser Lys Ser Lys Lys> 
a a a T RANSLATION OF HTAFIOO DNA fAl a a a _a > 

965 970 975 980 985 990 . 995 1000 1005 
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CAA GAT CCC AAT GCT CCA CCT CAG AAC AGA ATC CCT CTT CCT GAG TTG 
Gin Asp Pro Asn Ala Pro Pro Gin Asn Arg He Pro Leu Pro Glu Leu> 
a__a_a TRANSLATION OF HTAFIOO DNA fAI a a a a > 

1010 1015 1020 1025 1030 1035 1040 1045 1050 1055 
★ * * * * ' 

AAA GAT TCA GAT AAG TTG GAT AAG ATA ATG AAT ATG AAA GAA ACC ACC 
Lys Asp Ser Asp Lys Leu Asp Lys He Met Asn Met Lys Glu Thr Thr> 
a a__a__TRANSLATION OF HTAFIOO DNA [Al a a a a > 

1060 1065 1070 1075 1080 1085 1090 1095 1100 
* * * * if 

AAA CGA GTA CGC CTT GGG CCG GAC TGC TTA CCC TCC ATT TGT TTC TAT 
Lys Arg Val Arg Leu Gly Pro Asp Cys Leu Pro Ser He Cys Phe lVr> 
_a_a ^a__TRANSLATION OF HTAFIOO ENA fAl a a a a > 

1105 1110 1115 1120 1125 1130 1135 1140 1145 1150 
* * * * * 

ACA TTT CTC AAT GCT TAC CAG GGT CTC ACT GCA GTG GAT GTC ACT GAT 
Thr Phe Leu Asn Ala Tyr Gin Gly Leu Thr Ala Val Asp Val Thr Asp> 
_a__a__a__TRANSLATION OF HTAFIOO DNA fA1 a a a ^a > 

1155 1160 1165 1170 1175 1180 1185 1190 1195 1200 
* * ★ * ♦ 

GAT TCT AGT CTG ATT GCT GGA GGT TTT GCA GAT TCA ACT GTC AGA GTC 
Asp Ser Ser Leu He Ala Gly Gly Phe Ala Asp Ser Thr Val Arg Val> 
__a_a a__TRANSLATION OF HTAFIOO DNA fAI a a a a > 

1205 1210 1215 1220 1225 1230 1235 1240 1245 
★ ★ * . * 

TGG TCG GTA ACA CCC AAA AAG CTT CGT AGT GTC AAA CAA GCA TCA GAT 
Trp Ser Val Thr Pro Lys Lys Leu Arg Ser Val Lys Gin Ala Ser Asp> 
a a a TRANSLATION OF HTAFIOO DNA fAI a a a a ^ 

1250 1255 1260 1265 1270 1275 1280 1285 1290 1295 
* * * * ♦ 

CTT AGT CTT ATA GAC AAA GAA TCA GAT GAT GTC TTA GAA AGA ATC ATC 
Leu Ser Leu He Asp Lys Glu Ser Asp Asp Val Leu Glu Arg He Met> 
_a__a ^a TRANSLATION OF HTAFIOO DNA fAJ.a a a a ^> 

1300 1305 1310 1315 1320 1325 1330 1335 1340 
* ★ * * 4 

GAT GAG AAA ACA GCA AGT GAG TTC AAG ATT TTC TAT GGT CAC AGT GGG 
Asp Glu Lys Thr Ala Ser Glu Leu Lys lie Leu Tyr Gly His Ser Gly> 
a a a T RANSLATION OF HTAFIOO DMA fAI a a a a > 

1345 1350 1355 1360 1365 1370 1375 1380 1385 1390 
* * * * It 

CCT GTC TAC GGA GCC AGC TTC AGT CCG GAT AGG AAC TAT CTG CTT TCC 
Pro Val Tyr Gly Ala Ser Phe Ser Pro Asp Arg Asn Tyz Leu Leu Ser> 
a a a T RANSLATION OF HTAFIOO DNA fAI a a a a > 

1395 1400 1405 1410 1415 1420 1425 1430 1435 1440 
* * ♦ ♦ * 

TCT TCA GAG GAC GGA ACT GTT AGA TTC TGG AGC CTT CAA ACA TTT ACT 
Ser Ser Glu Asp Gly Thr Val Arg Leu Trp Ser Leu Gin Thr Phe Thr> 
a a a TRANSLATION OF HTAFIOO DNA fAl a a a a > 

1445 1450 1455 1460 1465 1470 1475 1480 1485 
* * * * 

TCT TTC GTC GGA TAT AAA GGA CAC AAC TAT CCA GTA TGG GAC ACA CAk ' 
Cys Leu Val Gly TVr Lys Gly His Asn TN'r Pro Val Trp Asp Thr Gln> 
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a a a TRANSLATION OF HTAFIOO DNA [A)_a a a a > 

1490 1495 1500 1505 1510 1515 1520 1525 1530 1535 
* * * * * 

TTT TCN CCA TAT GGA TAT TAT TTT GTG TCA GGG GGC CAT GAC CGA GTA 
Phe Ser Pro Tyr Gly T^^r Tyr Phe Val Ser Gly Gly His Asp Arg Val> 
a a a TRANSLATION OF HTAFIOO DNA {A]_a a a a > 

1540 1545 1550 1555 1560 1565 1570 1575 1580 

* * * * * 

GCT CGG CTC TGG GCT ACA GAC CAC TAT CAG CCT TTA AGA ATA TTT GCC 
Ala Arg Leu Trp Ala Thr Asp His Tyr Gin Pro Leu Arg lie Phe Ala> 
a a a ^TRANSLATION OF HTAFIOO DNA [A)_a_a_a__a_> 

1585 1590 1595 1600 1605 1610 1615 1620 1625 1630. 

* * * * * 

GGC CAT CTT GCT GAT GTG AAT TGT ACC AGA TTC CAT CCA AAT TCT AAT 
Gly His Leu Ala Asp Val Asn Cys Thr Arg Phe His Pro Asn Ser Asn> 
a a a TRANSLATION OF HTAFIOO DNA [A]_a^a_a__a > 

1635 1640 1645 1650 1655 1660 1665 1670 1675 1680 

★ * * * • . * 

TAT GTT GCT ACG GGC TCT GCA GAC AGA ACT GTG CGG CTC TGG GAC GTC 
Tyr Val Ala Thr Gly Ser Ala Asp Arg Thr Val Arg Leu Trp Asp Val> 
a a a T RANSLATION OF HTAFIOO DNA [A)_a__a_a_a > 

1685 1690 1695 1700 1705 1710 1715 1720 1725 

♦ ★ ★ . * 

CTC AAT GGT AAC TCT GTA AGG ATC TTC ACT GGA CAC AAG GGA CCA ATT 
Leu Asn Gly Asn Cys Val Arg He Phe Thr Gly His Lys Gly Pro Ile> 
a a a T RANSLATION OF HTAFIOO DNA [A]_a_a__a_a > 

1730 1735 1740 1745 1750 1755 1760 1765 1770 1775 
★ * * . * * . 

CAT TCC TTC ACA TTT TCT CCC AAT GGG AGA TTC CTC GCT ACA GGA GCA 
His Ser Leu Thr Phe Ser Pro Asn Gly Arg Phe Leu Ala Thr Gly Ala> 
a a a T RANSLATION OF HTAFIOO DNA [A)_a_a__a__a_> 

1780 1785 1790 1795 1800 1805 1810 1815 1820 

* ★ ♦ * * ■ 

ACA GAT GGC AGA GTC CTT CTT TGG GAT ATT GGA CAT GGT TTC ATC GTT 
Thr Asp Gly Arg Val Leu Leu Trp Asp He Gly His Gly Leu Met Val> 
a a a T RANSLATION OF HTAFIOO DNA (A]_a_a^_a_a_> ' 

1825 1830 1835 1840 1845 1850 1855 1860 1865 1870 

★ * * * * 

GGA GAA TTA AAA GGC CAC ACT GAT ACA GTC TGT TCA CTT AGG TTT AGT 
Gly Glu Leu Lys Gly His Thr Asp- Thr Val Cys Ser Leu Arg Phe Ser> 
a a a T RANSLATION OF HTAFIOO DNA [A]_a ^a_a_a__> 

1875 1880 1885 1890 1895 1900 1905 1910 1915 1920 

* * * * * 

AGA GAT GGT GAA ATT TTC GCA TCA GGT TCA ATC GAT AAT ACA GTT CGA 
Arg Asp Gly Glu He Leu Ala Ser Gly Ser Met Asp Asn Thr Val Arg> 
a a a T RANSLATION OF HTAFIOO DNA [A]_a_a_a^_a_> 

1925 1930 1935 1940 1945 1950 1955 1960 1965 

* * * • * 

TTA TGG GAT GCT ATC AAA GCC TTT GAA GAT TTA GAG ACC GAT GAC TTT 
Leu Trp Asp Ala He Lys Ala Phe Glu Asp Leu Glu Thr Asp Asp Phe> 
a a a TRANSLATION OF HTAFIOO DNA [A)_a ^a a ^a > 
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1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 
* ♦ * * * 

ACT ACA GCC ACT GGG CAT ATA AAT TTA CCT GAG AAT TCA CAG GAG TTA 
Thr Thr Ala Thr Gly His He Asn Leu Pro Glu Asn Ser Gin Glu Leu> 
a a a__TRANSLATION OF HTAFIOO DNA [A]_a a a a > 

2020 2025 2030 2035 2040 2045 2050 2055 2060 
* * * * * 

TTG TTG GGA ACA TAT ATG ACC AAA TCA ACA CCA GTT GTA CAC CTT CAT 
Leu Leu Gly Thr lyr Met Thr Lys Ser Thr Pro Val Val His Leu His> 
^a a a T RANSLATION OF HTAFIOO DNA (A].a_a_a ^a > 

2065 2070 2075 2080 2085 2090 2095 2100 2105 2110 
★ * ♦ * it 

TTT ACT CGA AGA AAC CTG GTT CTA GCT GCA GGA GCT TAT AGT CCA CAA 
Phe Thr Arg Arg Asn Leu Val Leu Ala Ala Gly Ala TVr Ser Pro Gln> 
a a a T RANSLATION OF HTAFIOO DNA [Al,,a_ a a a > 

2115 2120 2125 2130 2135 2140 2145 2150 
★ * * * 

TAA ACCAT CGGTATTAAA GACCAAAAAA AAAAAAAAAA AA 

***> 

> 
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5 10 15 20 25 30 35 40 45 50 

* * . ★ * * 

LLAVLQFLRQ SXLREAEEAL RREAGLLEEA VAGSGAPGEV DSAGAEVTSA 

55 , 60 65 70 -75 80 85 90 95 100 
***** 

LLSRVTASAP GPAAPDPPGT GASGATWSG SASGPAAPGK VGSVAVEDQP 

105 110 115 120 125 130 135 140 145 150 
***** 

DVSAVLSAYN QQGDPTMYEE YYSGLKHFIE CSLDCHRAEL SQLFYPLFVH 

155 160 165 170 175 180 185 190 195 200 

* * * * * 

MYLELVYNQH ENEAKSFFEK FHGDQECYYQ DDLRVLSSLT KKEHMKGNET 

205 210 215 220 225 230 235 240 245 250 

* * * ■♦ * 

MLDFRTSKFV LRISRDSYQL LKRHLQEKQN NQIWNIVQEH LYIDIFDGMP 

255 260 265 270 275 280 285 290 .295 300 

* ' * ' * * * 

RSKQQIDAMV GSLAGEAKRE ANKSKVFFGL LKEPEIEVPL DDEDEEGENE 

305 310 315 320 325 330 335 340 345 350 

* ★ * * * 

EGKPKKKKPK KDSIGSKSKK QDPNAPPQNR IPLPELKDSD KLDKIMNMKE 

355 360 365 370 375 380 385 390 395 400 
***** 

TTKRVRLGPD CLPSICFYTF LNAYQGLTAV DVTDDSSLIA GGFADSTVRV 

405 410 415 420 425 430 435 440 445 450 

* * * * * 

WSVTPKKLRS VKQASDLSLI DKESDDVLER IMDEKTASEL KILYGHSGPV 

455 460 465 470 475 480 485 490 495 500 

* ★ * * * 

YGASFSPDRN YLLSSSEDGT VRLWSLQTFT CLVGYKGHNY PVWDTQFSPY 

505 510 515 520 525 530 535 540 545 550 
***** 

GYYFVSGGHD RVARLWATDH YQPLRIFAGH LADVNCTRFH PNSNYVATGS 

555 560 565 570 575 580 585 590 595 600 
***** 

ADRTVRLWDV LNGNCVRIFT GHKGPIHSLT FSPNGRFLAT GATDGRVLLW 

605 610 615 620 . 625 630 635 640 645 650 

* * ★ * * 

DIGHGLMVGE LKGHTDTVCS LRFSRDGEIL ASGSMDNTVR LWDAIKAFED 

655 660 665 670 675 680 685 690 695 700 

* * ♦ * . ♦ 

LETDDFTTAT GHINLPENSQ ELLLGTYMTK STPWHLHFT RRNLVLAAGA 

705 
YSPQ* 
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Sequence Range: 1 to 3820 

10 20 30 40 50 

* ♦ * "* * 

AAT TCC TTT TTT ATA ACA AAC GCA AAT TAG TTA ATT AAA TTC TGG CGC AGA ACC GGC 
TTA AGG AAA AAA TAT TGT TTG CGT TTA ATC AAT TAA TTT AAG ACC GCG TCT TGG CCG 

70 80 90 100 , 110 

n * ♦ * * 

TGA GCG ATG GAA ACG CAA CCT GAG GTG CCC GAG GTG CCG CTG CGA CCG TTT AAA TTG 
ACT CGC TAG CTT TGC GTT GGA CTC CAC GGG CTC CAC GGC GAC GCT GGC AAA TTT AAC 
METQPEVPEVPLRP FKL 

130 140 150 160 170 

* * . * * * 

CAT CAG GTT GTG AGC CTC ACG GGC ATC AGT TTC GAG CGG AGG AGC ATA ATC GGC GTG 
GTA GTC CAA CAC TCG GAG TGC CCG TAG TCA AAG CTC GCC TCC TCG TAT TAG CCG CAC 
HQVVSLTGISFERRSI IGV 

190 200 210 220 230 

* * * * * 

GAG CTG ACC ATT GTG CCG AAC AGC GAG AAT CTG CGC CTG ATA CGC CTG AAT GCC AAG 
CTC GAC TGG TAA CAC GGC TTG TCG CTC TTA GAC GCG GAC TAT GCG GAC TTA CGG TTC 
ELTIVPNSENLR LIRLNA K 

250 260 270 280 290 

* ♦ • * * * 

CTG AGA ATC TAG AGC GTC GTT TTG AAC GAT GTC TGC CAG GCG GAT TTC ACG TAC TTC 
GAC TCT TAG ATG TCG CAG CAA AAC TTG CTA CAG ACG GTC CGC CTA AAG TGC ATG AAG 
LRIYSVVLNDV CQADF TYF 

310 320 330 . 340 350 

* ★ * * ■ . * 

CCC TTC CAG AAC ATC TGC TAC AAG GAG CCC AAG AGC CGC GCT CTG GAG GTC TAC TCC 
GGG AAG GTC TTG TAG ACG ATG TTC CTC GGG TTC TCG GCG CGA GAC CTC CAG ATG AGG 
PFQNICYKEPKSRALEVYS 

370 380 390 400 410 

* * * * « 

CAT CAT CTG ACC GCC GCC CAG TAC ACC GAT CCC GAT GTG AAC AAC GGC GAA CTG CTC 
GTA GTA GAC TGG CGG CGG GTC ATG TGG CTA GGG CTA CAC TTG TTG CCG CTT GAC GAG 
HHL .T AAQYTDPDVNNGELL 

430 440 450 460 470 

« * . ♦ * * 

CAG GTT CCG CCC GAG GGC TAC TCT ATG ATC CAG GAG GGT CAG GGT CTG CGC ATC CGC 
GTC CAA GGC GGG CTC CCG ATG AGA TAC TAG GTC CTC CCA GTC CCA GAC GCG TAG GCG 
QVPPEGYSMIQEGQGLRIR 

490 500 510 520 530 

* * . ★ * * 

GAG TTC TCG TTG GAG AAT CCC AAA TGC GGC GTA CAT TTT GTC ATA CCA CCC GCT TCA 
CTC AAG AGC AAC CTC TTA GGG TTT ACG CCG CAT GTA AAA CAG TAT GGT GGG CGA AGT 
EF SLENPKCGVHFVIPPAS 

550 560 570 580 590 

* * * * ♦ 

GAC GAG GAG ACA CAG ATG AAC AGC TCG CAT ATG TTC ACC AAT TGC TAT GAA AAC TCG 
CTG CTC CTC TGT GTC TAC TTG TCG AGC GTA TAC AAG TGG TTA ACG ATA CTT TTG AGC 
DEETQMNSSHMFT .NCYENS 

610 620 630 640 650 
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AGA TTG TGG TTT CCC TGC GTG GAC AGT TTC GCC GAT CCC TGC ACC TGG CGG CTG GAG 
TCT AAC ACC AAA GGG ACG CAC CTG TCA AAG CGG CTA GGG ACG TGG ACC GCC GAC CTC 
RLWFPCV DSFADPCTWRLE 

670 680 690 700 710 

* ♦ * ♦ ★ 

ACT GTC GAC AAA AAT ATG ACC GCC GTT TCG TGT GGA GAA CTT CTA GAA GTC ATT ATG 
TGA CAG CTG TTT TTA TAC TGG CGG CAA AGC ACA CCT CTT GAA GAT CTT CAG TAA TAG 
TVDKN MTAVSCGELLEVIM 

730 740 750 760 770 

* * * * * ■ 

CCA GAT CTG GGA AAG AAA ACC TTC CAC TAT TCG GTT AGC ACA CCA GTA TGT GCA CCA 
GGT CTA GAC GCT TTC TTT TGG AAG GTG ATA AGC CAA TCG TGT GGT CAT ACA CGT GGT 
p D L R K K T F H Y S V S T P V C A P 

790 800 810 820 830 

* * * * * 

ATT GCG CTG GCT GTG GGT CAG TTT GAG ATC TAC GTG GAT CCG CAC ATG CAT GAA GTG 
TAA CGC GAC CGA CAC CCA GTC AAA CTC TAG ATG CAC CTA GGC GTG TAC GTA CTT CAC 
lALAVGQFEIYVDPKMHEV 

850 860 870 880 890 

* * . * * 

CAC TTT TGT CTG CCC GGA TTG TTG CCG CTG TTA AAA AAT ACG GTT CGC TAT TTG CAC 
GTG AAA ACA GAC GGG CCT AAC AAC GGC GAC AAT TTT TTA TGC CAA GCG ATA AAC GTG 
HFCLPGLLPLLKNT VRYL H 

910 920 930 940 950 

* • ★ * * * 

GCA TTT GAA TTT TAC GAG GAG ACC TTA TCT ACG CGC TAC CCA TTC AGT TGC TAC AAA 
CGT AAA CTT AAA ATG CTC CTC TGG AAT AGA TGC GCG ATG GGT AAG TCA ACG ATG TTT 
A F E F Y E ETL S T RY P F S C YK 

970 980 990 1000 1010 

* * * * ★ 

GTG TTT GTA GAC GAA TTG GAC ACG GAC ATA AGT GCC TAT GCC ACT ATG AGC ATT GOT 
CAC AAA CAT CTG CTT AAC CTG TGC CTG TAT TCA CGG ATA CGG TGA TAC TCG TAA CGA 
VFVDELDTD ISAYAT MSIA 

1030 1040 1050 1060 1070 

GTG AAC CTG CTG CAC TCC ATA GCT ATC ATC GAT CAG ACC TAT ATA TCT CGA ACC TTT 
CAC TTG GAC GAC GTG AGG TAT CGA TAG TAG CTA GTC TGG ATA TAT AGA GCT TGG AAA 
VNLLHSIAI IDQTYI S.RTF 

1090 1100 1110 1120 1130 

it ♦ ★ . ★ . * 

TCG CGC GCT GTG GCT GAG CAA TTC TTC GGC TGC TTT ATT ACA TCG CAT CAT TGG TCG 
AGC GCG CGA CAC CGA CTC GTT AAG AAG CCG ACG AAA TAA TGT AGC GTA GTA ACC AGC 
SRAVAEQFF GC FITSHHW S 

1150 1160 1170 1180 1190 

* ♦ * * * 

ACC TGG CTG GCC AAG GGC ATT GCG GAG TAC CTG TGT GGA TTG TAT TCC AGG AAG TGC 
TGG ACC GAC CGG TTC CCG TAA CGC CTC ATG GAC ACA CCT AAC ATA AGG TCC TTC ACG 
TW LA KGIAE YLCGLYSRKC 

1210 1220 1230 1240 1250 

* • * ★ * 

GGC AAC AAC GAG TAC CGT GCT TGG GTG CAA TCT GAA CTG GCG CGT GTC GTT CGC TAC 
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CCG 
G 



N 



TTG CTC ATG GCA CGA ACC CAC GTT AGA CTT GAC CGC GCA GAG CAA GCG ATG 
N E Y R A W V Q S £ LA R V V R Y 



1270 



1280 



1290 



1300 



1310 



GAG CAG TAT GGC GGC ATT ATT CTC GAT TGC AGT GAG CCG CCA GCA CCT TTG CCT GTT 
CTC GTC ATA CCG CCG TAA TAA GAG CTA ACG TCA GTC GGC GGT CGT GGA AAC GGA CAA 
EQYGGIILDCSQPPAPLPV 



1330 



1340 
♦ 



1350 



1360 



1370 



GGC ACA AAT CAA TCG GCT GCT TCC AGC AAA CAG CAG GAG ATT GTC CAC TAT TTT CCG 
CCG TGT TTA GTT AGC CGA CGA AGG TCG TTT GTC GTC CTC TAA CAG GTG ATA AAA GGG 
GTNQSAASSKQQEIVHYFP 



1390 



1400 



1410 
* 



1420 



1430 

♦ 



AAG AGT TTG CAC ACC GTA TCG CCG AAG TAT GTG GAG GCG ATG CGA AGG AAA GCG CAT 
TTC TCA AAC GTG TGG CAT AGC GGC TTC ATA CAC CTC CGC TAC GCT TCC TTT CGC GTA 
KSLH TVSPKYV EAMRRKAH 



1450 



1460 



1470 



1480 



1490 



GTA ATC CGA ATG CTG GAG AAC CGC ATC GGG CAG GAG CTG CTG ATT CAG GTG TTC AAT 
CAT TAG GCT TAC GAC CTC TTG GCG TAG CCC GTC CTC GAC GAC TAA GTC CAC AAG TTA 
VI RMLENRIGQELLIQV FN 



1510 



1520 



1530 
* 



1540 



1550 



CAA TTG GCT TTG GCT TCT AGT GCG GCA ACG ACG AAG ATC GGT GCA GGA CTC TGG TCT 
GTT AAC CGA AAC CGA AGA TCA CGC CGT TGC TGC TTC TAG CCA CGT CCT GAG ACC AGA 
QLALASSAATTKIGAGL WS 



1570 



1580 



1590 



1600 



1610 

CTG CTC ATC TCG AAC CAA CAT TTT TAT CAA GGC CAT CTT CAC . GTA ACC GGA AAA GAT 
GAC GAG TAG AGC TTG GTT GTA AAA ATA GTT CCG GTA GAA GTG CAT TGG CCT TTT CTA 
LLI SNQHFYQGHLHVTGK 'D 



1630 



1640 



1650 



1660 



1670 



TCT GTC TTC ATG GAC CAG TGG GTG CGC ACT GGA GGG CAC GCC AAG TTT TCG CTC ACA 
AGA CAG AAG TAC CTG GTC ACC CAC GCG TGA CCT CCC GTG CGG TTC AAA AGC GAG TGT 
SVFMDQWVRTGGHA KFSLT 



1690 



1700 



1710 
★ 



1720 



1730 



GTG TTC AAT CGC AAG AGA AAC ACG ATT GAA CTG GAA ATC CGC CAG GAC TAT GTT AAT 
CAC AAG TTA GCG TTC TCT TTG TGC TAA CTT GAC CTT TAG GCG GTC CTG ATA CAA TTA 
VFNRKRNTI ELEIRQDYVN 



1750 1760 1770 

« « * 

CGG GGA ATT AGA AAA TAC AAT GGT CCA TTG 
GCC CCT TAA TCT TTT ATG TTA CCA GGT AAC 
RGI RKYNGPL 

1810 1820 1830 

ft It ic 

TTT AAG CAC ACA TTG CAG ATT GAG AGT ACC 
AAA TTC GTG TGT AAC GTC TAA CTC TCA TGG 
FKHTLQIEST 



1780 1790 

ATG GTG CAG CTG CAG GAG TTG GAT GGA 
TAC CAC GTC GAC GTC CTC AAC CTA CCT 
MVQLQ. ELDG 

1840 1850 
* ♦ 

CTG GTA AAG TCC GAT ATC ACT TGT CAC 
GAC CAT TTC AGG CTA TAG TGA ACA GTG 
LVKSDITCH 
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1870 . 1880 1890 1900 1910 

* * * * ♦ 

AAG AGC AGG CGT AAC AAA AAG AAG AAG ATC CCC TTG TGC ACC GGT GAG GAA GTG GAT 
TTC TCG TCC GCA TTG TTT TTG TTC TTC TAG GGG AAC ACG TGG CCA CTC CTT CAC CTA 
KSRRNKKKK. IP LCTGEEVD 

1930 1940 1950 1960 1970 

• * * . * « » 

GAT TTA TCA GCC ATG GAC GAG TCA CCT GTG CTT TGG ATC CGC CTC GAT CCC GAA ATG 
CTA AAT ACT CGG TAG GTG CTG AGT GGA CAC GAA ACC TAG GCG GAG CTA GGG CTT TAC 
D L SA MD D S P VL W I R LD P E M 

1990 2000 2010 2020 2030 

* * * ★ * 

CTG CTG CGC GAC CTC ATA ATC GAA CAG CCC GAC TTC CAG TGG CAG TAT CAG CTT CGG 
GAC GAC GCG CTG GAG TAT TAG CTT GTC GGG CTG AAG GTC ACC GTC ATA GTC GAA GCC 
L L R D L I I E Q P D F Q W Q Y Q L R 

2050 2060 2070 2080 2090 

* * * * ★ 

GAA CGT GAT GTT ACT GCT CAA TTT CAG GCG ATT CAA GCC CTG CAA AAG TAC CCC ACG 
CTT GCA CTA CAA TGA CGA GTT AAA GTC CGC TAA GTT CGG GAC GTT TTC ATG GGG TGC 
ER DVTAQFQAIQALQK Y PT 

2110 2120 2130 2140 2150 

* * * * . ♦ 

GCC ACC AGG CTT GCT TTA ACC GAC ACC ATA GAA AGC GAA CGT TGC TTC TAT CAG GTG 
CGG TGG TCC GAA CGA AAT TGG CTG TGG TAT CTT TCG CTT GCA ACG AAG ATA GTC CAC 
A T R L A L T D T I E S E R C F Y Q V 

2170 2180 2190 2200 2210 

* * * * ♦ 

TGC GAG GCA GCC CAC AGC TTG ACC AAA GTG GCC AAC CAG ATG GTG GCC TCC TGG AGT 
ACG CTC CGT CGG GTG TCG AAC TGG TTT CAC CGG TTG GTC TAC . CAC CGG AGG ACC TCA 
C E A A H S L T KVA NQ MV A S WS 

2230 2240 2250 2260 2270 

* « * * . ♦ 

CCG CCC GCC ATG CTG AAC ATA TTT AGG AAG TTT TTC GGC TCA TTT AGT GCT CCG CAC 
GGC GGG CGG TAC GAC TTG TAT AAA TCC TTC AAA AAG CCG AGT AAA TCA CGA GGC GTG 
PPAMLNI FRKF FG SFSA PH 

2290 2300 2310 2320 2330 

* * ★ ★ * 

ATC AAA CTG AAC AAC TTC TCC AAC TTT CAG CTG TAC TTC CTG CAG AAG GCT ATT CCC 
TAG TTT GAC TTG TTG AAG AGG TTG AAA GTC GAC ATG AAG GAC GTC TTC CGA TAA GGG 
IKLNNFS NFQLYFLQKA IP 

2350 2360 2370 2380 2390 

* * ■ ♦ * ♦ 

GCC ATG GCA GGT CTG CGC ACA TCT CAT GGT ATT TGC CCG CCG GAA GTG ATG CGT TTT 
CGG TAC CGT CCA GAC GCG TGT AGA GTA CCA TAA ACG GGC GGC CTT CAC TAC GCA AAA 
AMAG LR T S H G I C PP E VMRF 

2410 2420 2430 2440 2450 

* * * ★ * 

TTC GAT CTC TTC AAG TAC AAC GAG AAT TCG CGT AAC CAT TAC ACG GAT GCA TAC TAC 
AAG CTA GAG AAG TTC ATG TTG CTC TTA AGC GCA TTG GTA ATG TGC CTA CGT ATG ATG 
FDL FKYN ENSRNHYT DAYY 

2470 2480 2490 2500 2510 
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GCA GCT TTG GTA GAA GCT OTA GGC GAA ACC TTA ACA CCT GTG GTC TCC GTT GCT ATC 
CGT CGA AAC CAT CTT CGA GAT CCG CTT TGG AAT TGT GGA CAC CAG AGG CAA CGA TAG 
AA LV EAt GE T L T P V V S VA I 

2530 2540 2550 2560 2570 

* ♦ * * ♦ 

GGC ACA CAA ATC ACT ACG GAC AGT CTA TCC ACG GAT GCG AAA CTT GTG CTA GAT GAA 
CCG TGT GTT TAG TGA TGC CTG TCA GAT AGG TGC CTA CGC TTT GAA CAC GAT CTA CTT 
GTQITTDSLSTDAKLVLDE 

2590 2600 2610 2620 2630 

* ♦ * ★ * 

ACA CGT CTG CTG AAC ATG GAG AAA CAT CTA CCC TCG TAC AAG TAC ATG GTG TCC GTG 
TGT GCA GAC GAC TTG TAC CTC TTT GTA GAT GGG AGC ATG TTC ATG TAC CAC AGG CAC 
TRLLNMEKHLPSYKYMVSV 

2650 2660 2670 2680 2690 

* * - ■* * * ' 

TGT CTG AAG GTC ATC CGG AAG CTG CAA AAA TTC GGT CAT CTG CCC TCA CTG CCG CAC 
ACA GAC TTC CAG TAG GCC TTC GAC GTT TTT AAG CCA GTA GAC GGG AGT GAC GGC GTG 
CLKVIRKLQKFGH LPSLP H 

2710 2720 2730 2740 2750 

* * * * ^ 

TAC CGC AGC TAT GCC GAA TAT GGA ATA TAT CTC GAT CTC CGC ATT GCT GCT ATG GAG 
ATG GCG TCG ATA CGG CTT ATA CCT TAT ATA GAG CTA GAG GCG TAA CGA CGA TAC CTC 
yRSYAEYGI YLDLRIAAME 

2770 2780 2790 2800 2810 

* * * * ★ 

CTC GTG GAC TTT GTG AAA GTG GAT GGG CGC AGC 1GAG GAT TTG GAA CAT TTG ATT ACT 
GAG CAC CTG AAA CAC TTT CAC CTA CCC GCG TCG CTC CTA AAC CTT GTA AAC TAA TGA 
LVDFVKVDGRSEDLEHLIT 

2830 2840 2850 2860 2870 

* * . * * * 

CTG GAA ACT GAT CCG GAT CCG GCT GCT CGC CAT GCA CTG GCC CAA CTG CTG ATC GAT 
GAC CTT TGA CTA GGC CTA GGC CGA CGA GCG GTA CGT GAC CGG GTT GAC GAC TAG CTA 
LETDPD PAARHALAQLLID 

2890 2900 2910 2920 2930 

* * * ^* * 

CCG CCT TTC ACA CGC GAA TCT CGC AGC CGT CTG GAT AAA CCC AAT CTC GTG GAT CGT 
GGC GGA AAG TGT GCG CTT AGA GCG TCG GCA GAC CTA TTT GGG TTA GAG CAC CTA GCA 
P. PFTRE SRSRLDK PNLVDR 

2950 2960 2970 2980 2990 

* ■ * * * ★ • 

TGG TTC AGT ATT AAT CGC TTG CCC TAC GAT ACC CAA STG CGC TGC GAT ATT GTC GAT 
ACC AAG TCA TAA TTA GCG AAC GGG ATG CTA TGG GTT SAC GCG ACG CTA TAA CAG CTA 
WFSINRLPYDTQXRCDIVD 

3010 3020 3030 3040 3050 

* * * * * 

TAC TAC GCA CTG TAC GGA ACT AAG CGT CCG AAT TGC TTG CAG GCC GGC GAG AAC CAA 
ATG ATG. CGT GAC ATG CCT TGA TTC GCA GGC TTA ACG AAC GTC CGG CCG CTC TTG GTT 
YYALYGTKRPNCLQAGENQ 

3070 3080 . 3090 3100 3110 

* * * * * 

TTC TAC AAG GAT TTG ATG AAG GAC AAT AAT AGC AGT GTA GGC AGC GTA ACC GGC AGC 
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AAG ATG TTC CTA AAC TAG TTC CTG TTA TTA TCG TCA CAT CCG TCG CAT TGG CCG TCG 
FYKDLMKDNNSSVGSVTGS 



3130 314D 3150 

« ■ * * 

AAG AAG ACC AGT GAT TCA AAG TCA CAT TTG 
TTC TTC TGG TCA CTA AGT TTC AGT GTA AAC 
KKTSDSKSHL 

3190 3200 3210 

CCA CAG GAG CGG CAA AAG CCG GCA ATG GTT 
GGT GTC CTC GCC GTT TTC GGC CGT TAC CAA 
PQERQKPAM. V 

3250 3260 3270 

* * ★ 

GAG GTG GGC GAT GAG ATT ATC AAG CTG GAA 
CTC CAC CCG CTA CTC TAA TAG TTC GAC CTT 
EVGDEII KLE 



3160 3170 

CCA ACA CCA ACG AAT ACT TTG GAC AAT 
GGT TGT GGT TGC TTA TGA AAC CTG TTA 
PTP TNTLDN 

3220 3230 
* * 

ACC ATC AAG CGA ACG GCC ACA GAA GCA 
TGG TAG TTC GCT TGC CGG TGT CTT CGT 
T I K R T A T E A 

3280 3290 

CGC AGC GAG GAG ATC ACC GTG CTA GAT 
GCG TCG CTC CTC TAG TGG CAC GAT CTA 
RSEE ITVLD 



3310 3320 3330 3340 3350 

«• * ♦ * ★ 

CCA GTT AAC GTG CAG GCC TAT GAC AGT GAG ACC AAA GTG AAT GCC CTG CAG GCA GAT 
GGT CAA TTG CAC GTC CGG ATA CTG TCA CTC TGG TTT CAC TTA CGG GAC GTC CGT CTA 
PVNVQAYDSETKVNALQ AD 



3370 3380 3390 

* ♦ * 
GAA GCA CGT GAT ACC CAT CAG GCT GCC AAG 
CTT CGT GCA CTA TGG GTA GTC CGA CGG TTC 

EARDTHQAAK 

3430 3440 3450 

GAT AAC TCA TCC ACA ATG CTC GAC GTG GGC 
CTA TTG AGT AGG TGT TAC GAG CTG CAC CCG 
DNSS TMLDVG 

3490 3500 3510 

* * * 
GAG GGC AAA TTG AAG TCC GGC GAT GGT GGG 
CTC CCG TTT AAC TTC AGG CCG CTA CCA CCC 

EGKLKSGDGG 



3550 3560 3570 

* * * 

AAG CAT AAG CAC AAA CVC AAG CAT AGG CAC 
TTC GTA TTC GTG TTT GBG TTC GTA TCC GTG 
KHKHKXKHRH 

3610 3620 3630 

* * * 

AAG GAC AAG CGT GAC CCG CAT ATA TTC ACC 
TTC CTG TTC GCA CTG GGC GTA TAT AAG TGG 
KDKRDPHIFT 



3400 3410 

* * 

CGC CTT AAG AAC GAA ATG TAC GCC GAG 
GCG GAA TTC TTG CTT TAC ATG CGG CTC 
RLK NEM YAE 

3460 3470 

* * 

GAC TCC ACC AGA TAT GAG AGT AGC CAC 
CTG AGG TGG TCT ATA CTC TCA TCG GTG 
DSTRYESSH 

3520 3530 

CTC AAG AAG AAA AAG AAG AAG GAG AAG 
GAG TTC TTC TTT TTC TTC TTC CTC TTC 
L K K K K K K E K 

3580 3590 

* * 

AGC AAG GAC AAG GAC AAG GAG CGA AAG 
TCG TTC CTG TTC CTG TTC CTC GCT TTC 
S KDKDKERK 

3640 3650 

CTG CAG GCG CGC GAG ACA GCC ACT CCG 
GAC GTC CGC GCG CTC TGT CGG TGA GGC 
LQA RETATP 

3700 3710 • 

* * 

AAT AGC CTG CCG CCC ATG AAC CTT AAC 
TTA TCG GAC GGC GGG TAC TTG GAA TTG 
NSLPPMNLN 



3670 

ACT CTC AGC TCG GAG 
TGA GAG TCG AGC CTC 
T L S S E 



3680 3690 
. * * 
GAC AGT AGC AAC AGC 
CTG TCA TCG TTG TCG 
D S S N S 
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3730 3740 3750 3760 3770 

* * * * ♦ 

GTG AGG GTT CCT ACA GGT GGG GAA ATT GCA ATG TTT GGG GGA TAG ATG ACA GAA TAA 
GAG TCC CAA GGA TGT CCA CCC CTT TAA CGT TAG AAA CCC CCT ATC TAG TGT CTT ATT 
VRVPTGGEIAMFGG* MTe* 

3790 3800 3810 3820 

* * * * 

TAT AAT AGO TTA AAA AAA AAA AAA AAA AAA AAA AAA AAA A 
ATA TTA TGG AAT TTT TTT TTT TTT TTT TTT TTT TTT TTT T 

YNTLKK.KKKKKKKX> 
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LOCUS TRANSLDTAF 1217 PJ^. PROT 

FEATURES Froni To/Span Description 

Peptide 1 > 1217 67 to 3820 of dTAFlSO (translated) 

[Split] 

< 1218 1218 67 to 3820 of dTAFlSO (translated) 

(Split) 

1 METQPEVPEV PLRPFKLAHQ WSLTGISFE RRSIIGWEL TIVPNSENLR LIRLNAKQLR 
61 lYSWLNDVC QADFTYFDPF QNICYKEPKS RALEVYSKHH LTAAQYTDPD VNNGELLIQV 
121 PPEGYSMIQE GQGLRIRIEF SLENPKCGVH FVIPPASTDE ETQMNSSHMF TNCYENSSRL 
181 WFPCVDSFAD PCTWRLEFTV DKNMTAVSCG ELLEVIMTPD LRKKTFHYSV STPVCAPNIA 
241 LAVGQFEIYV DPHMHEVTHF CLPGLLPLLK NTVRYLHEAF EFYEETLSTR YPFSCYKQVF. 
301 VDELDTDISA YATMSIASVN LLHSIAIIDQ TYISRTFMSR AVAEQFFGCF ITSHHWSDTW 
361 LAKGIAEYLC GLYSRKCFGN NEYRAWVQSE UUIWRYEEQ YGGIILDCSQ PPAPLPVSGT 
421 NQSAASSKQQ EIVHYFPIKS LHTVSPKYVE AMRRKAHFVI RMLENRIGQE LLIQVFNKQL 
481 ALASSAATTK IGAGLWSQLL ISNQHFYQGH LHVTGKDMSV FMDQWVRTGG HAKFSLTSVF 
541 NRKRNTIELE IRQDYVNQRG IRKYNGPLMV QLQELDGTFK HTLQIESTLV KSDITCHSKS 
601 RRNKKKKIPL CTGEEVDMDL SAMDDSPVLW IRLDPEMILL RDLIIEQPDF QWQYQLRHER 
661 DVTAQFQAIQ ALQKYPTNAT RLALTDTIES ERCFYQVRCE AAHSLTKVAN QMVASWSGPP 
721 AMLNIFRKFF GSFSAPHIIK LNNFSNFQLY FLQKAIPVAM AGLRTSHGIC PPEVMRFLFD 
781 LFKYNENSRN HYTDAYYRAA LVEALGETLT PWSVAIHGT QITTDSLSTD AKLVLDEVTR 
/ 841 LLNMEKHLPS YKYMVSVSCL KVIRKLQKFG HLPSLPHIYR SYAEYGIYLD LRIAAMECLV 
901 DFVKVDGRSE DLEHLITLLE TDPDPAARHA LAQLLIDNPP FTRESRSRLD KPNLVDRLWF 
961 SINRLPYDTQ XRCDIVDLYY ALYGTKRPNC LQAGENQSFY KDLMKDNNSS VGSVTGSFKK 
1021 TSDSKSHLPT PTNTLDNEPQ ERQKPAMVTI KRTATEAFEV GDEIIKLERS EEITVLDEPV 
1081 NVQAYDSETK VNALQADEEA RDTHQAAKRL KNEMYAEDDN SSTMLDVGDS TRYESSHEEG 
1141 KLKSGDGGLK KKKKKEKKKH KHKXKHRHSK DKDKERKDKD KRDPHIFTLQ ARETATPDTL 
1201 SSEDSSNSNS LPPMNLN 
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Sequence Range: 1 to 872 

10 20 30 40 50 60 

***** 
CCAAAAATCC GCCCAACTTA CTGTACTTTC CCCAAACACT TCCAACCAAC CGACCTACCA 
GGTTTTTAGG CGGGTTGAAT GACATGAAAG GGGTTTGTGA AGGTTGGTTG GCTGGATGGT 

70 80 . 90 100 110 

* * * * 

CCCACTTGAT TTGACTCTGA AGAAACCCAA AAGCA ATG TCG GAT CTC TTT ACC ACT 
GGGTGAACTA AACTGAGACT TCTTTGGGTT TTCGT TAG AGC CTA GAG AAA TGG TGA 

M S D L F T T> 
TRANSLATION OF 802 F : 

120 130 140 150 160 

* * * * ★ - 

TTC GAT AGC AAC GGC GTC GCG AGG CAC CAC CTG CAC CAC AAC CAC AAC 

AAG CTA TCG TTG CCG CAG CGC TCC GTG GTG GAC GTG GTG TTG GTG TTG 

FDSNGVARHHLHHNHN> 
a a a a TRANSLATION OF 802 FULL (A) a a a a > 

170 180 190 200 210 ^ 



* 



TCC ACA TCG TCC GCC AGC GGA CTG CTC CAC GAC CCA CCC ATG GCC TCG 
AGG TGT AGC AGG CGG TCG .CCT GAC GAG GTG CTG GGT GGG TAC CGG AGC 

STSSA SGLLHDPP MA S> 
a a a a TRANSLATION OF 802 FULL [A] a a. a a > 

220 230 240 250 260 

* * * * ★ 

CCC TCC CAG CAC AGT CCG ATG ACC AAC AAC AGC AAC TCA TCC TCG CAG 
GGG AGG GTC GTG TCA GGC TAC TGG TTG TTG TCG TTG AGT AGG AGC GTC 
PSQHSPMTNNSNSSSQ> 
a a a a TRANSLATION OF 802 FULL (A) a a a a > 

270 280 290 300 

* * * * . 

AAC GGC GGA CCG GTT TCC . GGT TTG GGT ACG GGA ACG GGC CCC ATA TCT 
TTG CCG CCT GGC CAA AGG CCA AAC CCA TGC CCT TGC CCG GGG TAT AGA 
N G G P V S G L G T G T G P I s> 
a a a a TRANSLATION OF 802 FULL (A) a a a a > 

310 320 330 340 350 



GGT GGT AGC AAG TCA TCC AAT CAC ACA TCA TCC GCC GCC GGT TCC GAG 
CCA CCA TCG TTC AGT AGG TTA GTG TGT AGT AGG CGG CGG CCA AGG CTC 

GGSKSSNHTSSAAGSE> 
a a a a TRANSLATION OF 802 FULL [A] a a a a > 

• 360 370 380 390 400 

* * ^ * * « 

AAC ACT CCC ATG CTT ACC AAA CCG CGT CTC ACA GAG CTC GTC CGA GAG 
TTG TGA GGG TAC GAA TGG TTT GGC GCA GAG TGT CTC GAG CAG GCT CTC 

NTPMLTKPRLTELVR E> 
a a a a TRANSLATION OF 802 FULL (A) a a a a > 

410 420 430 440 450 

***** 
GTG GAT ACC ACC ACG CAG CTG GAC GAG GAT GTT GAG GAG CTT CTG CTT 
CAC CTA TGG TGG TGC GTC GAC CTG CTC CTA CAA CTC CTC GAA GAC GAA 

VDTTTQLDEDVEELLL> 
a a a a TRANSLATION OF 802 FULL (A] a a a a > 
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460 470 480 490 500 

* * * * * 

cag atc atc gac gac ttt gtg agg gac acc gtc aag tcg acg agc gcc 
gtc tag tag gtg ctg aaa gac tcc ctg tgg cag ttc agc tgc tcg cgg 
qiiddf'vrdtvkstsa> 
a a a a TRANSLATION OF 802 FULL [A] a a a a > 

510 520 530 540 

★ * ■ * * • 

TTC GCC AAG CAC CGA AAG TCT AAC AAG ATC GAG GTG CGC GAC GTG CAG 
AAG CGG TTC GTG GCT TTC AGA TTG TTC TAG CTC CAC GCG CTG CAC GTC 
F .AKHRKSNKiEVRDVQ> 
a a a a TRANSLATION OF 802 FULL (A) a a a a > 

550 560 570 580 590 

* * * * * 

CTG CAC TTT GAG CGG AAG TAC AAC ATG TGG ATA CCC GGC TTC GGT ACG 
GAC GTG' AAA CTC GCC TTC ATG TTG TAC ACC TAT GGG CCG AAG CCA TGC 
LHFERKYNMWI PG FGT> 
a a a a TRANSLATION OF 802 FULL (A] a a a a > 

600 610 620 630 640 

★ * . ★ * * 

GAC GAA CTG CGT CCC TAC AAG CGG GCA GCT GTC ACG GAG GCG CAC AAA 
CTG CTT GAC GCA GGG ATG TTC GCC CGT CGA CAG TGC CTC CGC GTG TTT 
DELRP YKRAAVTE AHK> 
a a a a TRANSLATION OF 802 FULL (A] a a a a > 

650 660 670 680 690 

* * * * ♦ 

CAG CGC CTT GCC CTC ATA CGG AAA ACG ATC AAG AAA TAC TAG AGGA 
GTC GCG GAA CGG GAG TAT GCC TTT TGC TAG TTC TTT ATG ATC TCCT 
QRLALIRKTIKKY *> 
a a a TRANSLATION OF 802 FULL (A] a a a > 

700 710 720 730 740 750 

* * « * H i, 

TTGGATCTAA TCGGGTCGAG GCTCTGTTTC GGTTTGCCGG ATTTCGCGTA TGCTAAACGT 
AACCTAGATT AGCCCAGCTC CGAGACAAAG CCAAACGGCC TAAAGCGCAT ACGATTTGCA 

760 770 780 790 800 810 

★ * * * * ' i, • ^ 

GCACACGCCA CAAACTAATT TAAGCTCCAA TTTAGATTAA ATAACAAATT ATCGTCGCTC 
CGTGTGCGGT GTTTGATTAA ATTCGAGGTT AAATCTAATT TATTGTTTAA TAGCAGCGAG 

820 830 840 850 860 870 

it ★ . * ★ « ^ 

TATTGTAGAT TTATTGTAAT AAAAGTGCAC TATTGATTTC ACATTCAAAA AAAAAAAAAA 
ATAACATCTA AATAACATTA TTTTCACGTG ATAACTAAAG TGTAAGTTTT TTTTTTTTTT 

AA 
TT 
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Sequence Range: 1 to 739 (jbcS NC> Z-^S— Z-^^J 

10 20 30 40 50 

* * * ♦ * 
CCCCCCCCCC CQCCCCCCGA TTTTTTTTAA ATG GAC GAA ATC CTC TTT CCC ACG 
GGGGGGGGGG GGGGGGGGCT AAAAAAAATT TAG CTG CTT TAG GAG AAA GGG TGC 

MDEILFPT> 
TRANSLATION OF 91 IG FULL > 

60 70 80 90 100 

* ♦ * * * 

GAG CAA AAG AGC AAC TCC CTA AGC GAC GGG GAC GAT GTC GAC CTG AAA 
GTC GTT TTC TOG TTG AGG GAT TCG CTG CCG CTG CTA GAG CTG GAC TTT 
QQK S N S L S <§)G(S,>(^\;(hL K> 
a a a TRANSLATION OF 911G FULL 5/20 [A] a a a > 

110 * 120 130 140 150 

* ' * * * « 
TTC TTC CAG TCG GGC CTC CGG GGG AGG CGA AAG GAC AGC GAC ACC TCG 
AAG AAG GTC AGC CCG GAG GCC CCC TCC GCT TTC CTG TCG CTG TGG AGC 

FFQ S G LRGRRK(K^S(DiT S> 
. a a a TRANSLATION OF 911G FULL 5/20 (A] a -a a > 

160 170 180 190 

* * * * 

GAT CCG GGA AAC GAT GCG GAT CGT GAT GGC AAA GAT GCG GAT GGG GAC 
CTA GGC CCT TTG CTA CGC CTA GCA CTA CCG TTT CTA CGC CTA CCC CTG 
i5)P G NCg)A(g)R(K)GK^A(g>G^5^- 
a a a TRANSLATION OF 911G FULL 5/20 [A] a a a > 

200 210 220 230 240 

* * * . * * 

AAC GAC AAC AAG AAC ACG GAC GGA GAT GGT GAC TCT GGC GAG CCG GCG 

TTG CTG TTG TTC TTG TGC CTG CCT CTA CCA CTG AGA CCG CTC GGC CGC 

n(j> n k n r CP gC9 g Os G(g^p a> 

a a a TRANSLATION OF 911G FULL 5/20 [A] a a a - > 

. 250 260 270 280 290 

* ♦ ' ♦ * * . 

CAC AAA AAG CTC AAA ACC AAG AAG GAA CTG GAG GAG GAG GAG CGC GAA 
GTG TTT TTC GAG TTT TGG TTC TTC CTT GAC CTC CTC CTC CTC GCG CTT 

HKKLKTKKELEEE ERE> 
a a a TRANSLATION OF 911G FULL 5/20 [A] a a a > 



300 310 320 330 340 

* ★ * * * 

CGA ATG CAG GTT CTC GTT TCC AAC TTT ACT GAA GAA CAG CTG GAT CGC 
GCT TAG GTC CAA GAG CAA AGG TTG AAA TGA CTT CTT GTC GAC CTA GCG 
RMQVLVSNFTEEQL DR> 
a a a TRANSLATION OF 911G FULL 5/20 [AJ a a a > 

350 360 370 380 390 

* ★ * * 

TAC GAA ATG TAT CGT CGC TCA GCC TTT CCC AAG GCC GCC GTC AAG CGT 
ATG CTT TAC ATA GCA GCG AGT CGG AAA GGG TTC CGG CGG CAG TTC GCA 
YEMYRRSAFPKAAVKR> 
a a a TRANSLATION OF 911G FULL 5/20 [A) a a a > 

400 410 420 430 

* * * * 

CTA ATG CAA ACT ATC ACC GGC TGT TCC GTG TCC CAA AAT GTT GTG ATA 
GAT TAC GTT TGA TAG TGG CCG ACA AGG CAC AGG GTT TTA CAA CAC TAT 
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LMQTI TGCSVSQNV Vi> 
a a a TRANSLATION OF 911G FULL 5/20 (A) a a a . > 

440 450 460 470 480 

* « . * ★ * 

GCC ATG TCC GGC ATT GCG AAG GTC TTC GTC GGC GAG GTT GTG GAG GAA 
CGG TAG AGG CCG TAA CGC TTC CAG AAG CAG CCG CTC CAA CAC CTC CTT 

AMSG IAKVFVGEVVEE> 
a a a TRANSLATION OF 911G FULL 5/20 (A) a a a > 

490 500 510 520 530 

***** 
GCC CTC GAC GTG ATG GAG GCC CAA GGT GAA TCC GGT GCC CTG CAG CCC 
CGG GAG CTG CAC TAC CTC CGG GTT CCA CTT AGG CCA CGG GAC GTC GGG 
ALDVME AQGESGAL QP> 
a a a TRANSLATION OF 911G FULL 5/20 [A] a a a > 

540 550 560 570 580 

* * * *. ■ * 

AAA TTC ATA CGA GAG GCA GTG CGA CGA CTG AGG ACC AAG GAT CGG ATG 
TTT AAG TAT GCT CTC CGT CAC GCT GCT GAC TCC TGG TTC CTA GCC TAC 
KFIR EAVRRLRTKD RM> 
a a a TRANSLATION OF 911G FULL 5/20 (A) a a a > 

590 600 610 620 , 630 

* * * * ♦ 
CCC ATA GGC AGA TAC CAG CAG CCC TAT TTC AGA CTG AAC TAG C GAGTC 
GGG TAT CCG TCT ATG GTC GTC GGG ATA AAG TCT GAC TTG ATC G CTCAG 

PIGRyQQPYFRLN*X> 
a a TRANSLATION OF 911G FULL 5/20 (A) a a a > 

640 650 660 670 680 690 

* * * * * * 
GAGACATTAA GAAATATAGT TTGTAAATCT GTTAGTGAAT ATAAAAATAC ATAAACAAGT 
CTCTGTAATT CTTTATATCA AACATTTAGA CAATCACTTA TATTTTTATG TATTTGTTCA 

700 710 720 730 

* * * « 

AAAAA GTAAA TAAATATAAA GATTTTTTCA AGAAAAAAAA AAAAAAAAG 
TTTTTCATTT ATTTATATTT CTAAAAAAGT TCTTTTTTTT TTTTTTTTC 
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10 



20 



30 



40 



ACC ATG TTG err CCG AAC ATC CIG CTC ACC GGT ACA CCA GGG GTT GGA 
TGG TAG AAC GAA GGC I'PG TAG GAC GAG TGG CCA TCT GGT CCC CAA CCT 



50 
* 



60 



90 



70 80 
ft - 1- • 

AAA ACC ACA CTAGGCAAAGAACTTGCGTCAAAATCA GGA CTG AAA TAG 
TTT TGG TOT GAT CCG TTT CTT GAA CGC AGT TTT AGT CCT GAC TTT ATG 



100 



110 120 130 140 

ATT A&T GTGGGTGATTrAGCTCGAGAAGTCTGATCA TCG GAT ATC ATG 
I'AA TTA CAC CCA CTA AAT CGA GCT CTT CAG ACT AGT AGC CTA TAG TAC 

M> 



150 160 170 IBO 190 

* * * * * 

GAG TCT GGC AAG ACG GCr TCT CCC AAG AGC ATG CCG AAA GAT GCA CAC 
CTC AGA CCG TTC TGC CGA AGA GGG TTC 105 TAC GGC TTT CTA CGT GTC 
KSGKTASPKSMPKDAQ> 



230 
* 

ATT ACA GAA 
TAA TGT ClT 
I T E 



240 
* 

TAT GAG CCA 
ATA CTC GGT 
y E P> 



200 

ATG ATG GCA CAA ATC 
TAC TAC CCT GTT TAG 
K K A Q I 

250 

AGA GTT ATA AAT CAG 
TCT CAA TAT TTA GTC 
R V I W Q 



210 220 
* * 

CTG AAG GAT ATQ GGG 
GAC TTC CTA TAC CCC 
L K D M G 

260 270 



280 

CGA TAT GTG ACC ACA 
GCT ATA CAC TGG TGT 
R Y V T T> 

330 
* 

GCT AAG AAA GCT ACT 
CGA TTC TTT CGA TCA 
A K K A T> 



ATG TTG GAC TTT GCC TIC 
TAC AAC: CTC AAA CGG AAG 
M L E F A . F 

290 300 310 320 

* . * * * 

ATT CTA GAT GAT GCA AAA ATT TAT TCA AGC CAT 
TAA GAT CTA CTA CGT TTT TAA ATA ACT TCG GTA 
ILDDAKIYSSH 



340 350 360 370 380 

* * * • * 

GTr GAT GCA GAT GAT GTG CGA TTG GCA ATC CAG TGC CGC GCT GAT CAG 
CAA CTA CGT CTA CTA CAC GCT AAC CGT TAG GTC ACG GCG CGA CTA GTC 
VDADDVRLAIQCRADQ> 

390 400 410 420 430 

* *■ « « it 

TCT TTT ACC TCT CCT CCC CCA AGA GAT TTT TTA TIA GAT ATT GCA AUG 
AGA AAA TGG AGA GGA GGG GGT TCT CTA AAA AAT AAT CTA TAA CGT TCC 
SFTSPPP RDFI. LD1AR> 



440 



450 



460 



470 



480 



CAA AGA AAT CAA ACC CCT TIG CCA TTG ATC AAG CCA TAT TCA GGT CCT 
GTT TCT TTA GTT TGG GGA AAC GGT AAC TAG TTC GGT ATA AGT CCA GGA 
QR NQT PL P L I K PY SG P> 



490 



500 



510 



520 
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AGC TTG CCA CCT GAT 
TCC AAC GGT GGA CTA 

;? p P D 


AGA 
TCT 

K 


TAC 
ATG 
Y 


TGC 
ACG 
C 


TTA 
A\T 

U 


ACA 

TGT 


GCT CCA 
CGA GGT 

A P 


AAC 
TTC 

N 


TAT 
ATA 

Y 


AGG 
TCC 
R 


CTG 
GAC 
L> 


53G 540 




«■ 










570 






AAA TCT TTA CAG AAA 
TIT AGA AAT GTC TTT 

i\ ^ V 


AAG 
TTC 

V 

j\ 


GCA 
CGT 
A 


TCA 
AGT 
S 


ACT 
TGA 


TCT 
AGA 
S 


GCG 
CGC 
A 


GGA 
CCT 
G 


AGA 
TCT 
R 


ATA 
TAT 
I 


ACA 
TCT 
T 


GTC 
CAG 
V> 


580 590 






600 

it 






610 
« 




620 




CCG CGG TTA AGT GTT 
GGC GCC AAT TCA CAA 
P R L S V 


GGT 
CCA 
G 


TCA 
AGT 
5 


GTT 
CAA 
V 


ACT 
TGA 
T 


AGC 
TCG 
5 


AGA 
TCT 
R 


CCA 
GGT 
P 


AGT 
TCA 

s 


ACT 
IGA 
T 


CCC 

P 


ACA 

•nrr 

T^ 


630 640 

J. * ■ * 










660 
* 






670 

* 


CTA GGC ACA CCA ACC 
GAT CCG TGT GGT TGG 

r ^' fP D 


CCA 
GGT 
P 


CAG 
GTC 
0 


ACC 
TGG 
T 


ATG 
TAC 
M 


TCT 
AGA 


GTT 
CAA 

V 


TCA 
AGT 

S 


ACT 
TGA 
T 


AAA 
TTT 
K 


GTA GGG 
CAT CCC 
V G> 


680 


690 
* 






700 
• 




710 
* 






720 
* 


• 

ACT CCC ATG TCC CTC 
TEA GGG TAG AGG GAG 
T P K S L 


ACA 

TGT 
T 


GGT 
CCA 
G 


CAA 
GTT 
Q 


AGG 
TCC 
R 


TTT 
AAA 
F 


ACA 
TGT 
T 


GTA 
CAT 
V 


CAG 
GTC 
Q 


A3XS 
TAC 
M 


CCT ACT 
GGA TGA 
P T> 


730 


740 






750 
* 






760 






TCT CAG TCT CCA GCT 
AGA CnC AOA GGT CGA 
S Q S P A 


GTA 
CAT 

V 


AAA 
TTT 

K 


GCT 
QGA 
A 


TCA 
AGT 

S. 


ATT 
TAA 
I 


CCT 
GGA 
P 


GCA 
CGT 
A 


ACC 
T 


TCA 
AGT 


GCA 
CGT 
A 


GTT 
CAA 
V> 


770 *?80 




790 




800 

* 






810 
* 






CAG AAT GTT CTG ATT 
GTC TTA CAA GAC TAA 
Q N V L I 


AAT 
TTA 
N 


CCA 
GGT 
P 


TCA 
AGT 
S 


TTA 
AAT 
L 


ATC 
TAG 
I 


GGG 
CCC 
G 


TCC 
AGG 
S 


AAA 

TTT 

K 


AAC 
TTG 

N 


ATT 
TAA 
I 


CTT 
GAA 
I*> 


820 830 






840 
♦ 






850 




860 

■ ♦. 




* * 

ATT ACC ACT AAT ATG 
TAA TGG TGA TTA TAC 
I T T N M 


ATG 
TAC 
M 


TCA 
AGT 
S 


TCA 
AGT 
S 


CAA 
GTT 
Q 


AAT 
TTA 
N 


ACT 
TGA 
T 


CCC 
CGG 
A 


AAT 
TTA 
N 


GAA 
CTT 
£ 


TCA 
AGT 
S 


TCA 
AGT 
S> 


870 880 




890 
« 






900 

it 






910 
* 


AAT GCA TTC AAA AGA 
•ITA CGT AAC TTT TCT 
N A L K R 


AAA 
TTT 

K 


CGT 
GCA 

R 


GAA 
CTT 
£ 


GAT 
CTA 
D 


GAT 
CTA 
D 


GAT 
CTA 
D 


GAT 
CTA 
D 


GAC 
CTS 
D 


GAT 
CTA 
D 


GAT 
CTA 
D 


GAT 
CTA 
D> 


920 


930 






940 
* 




950 






960 


* 

GAT GAT GAC TAT GAT 
CTA CTA era ATA CTA 
D D D Y D 


AAT 
TTA 

N 


CTG 
GAC 
L> 


TAA 
ATT 


TCT 
AGA 


AGC 
TCG 


CTT 
GAA 


GCT GAA 
CGA CTT 


TGT 
ACA 


AAC 

TTG 


ATG 
TAC 


970 


980 






990 






1000 






TAT ACT TGG TCT TGA 
ATA TGA ACC AGA ACT 


ATI* CAT 
TAA GTA 


TGT 
ACA 


ACT 
TGA 


GAT 
CTA 


ATT 
TAA 


AAA CAT GCA 
TTT GTA CGT 


TGC 
ACG 


TGG 
ACC 
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LOIC 1020 1030 1040 1050 

• ★ • " * 

A'K: TTT TCA AGT TGT GTT m G/Vi aac taa taa taa tga gta aac aca 
TAC AAA AGT TCA ACA CAA AAT ClT TTC-: ATT ATT ATT ACT CAT TTG TGT 



lOCO 



1070 



1080 



1090 



1100 



GTT ACC ATA CTT TTC ART TGA AAT GAA GGT TTT TCA TCA GCC TTA AZ\A 
CAA TGG TAT GAA AAG TTA ACT TTA CTT OCA AAA AGT ACT CGG AAT TTT 



1110 



1120 

* 



1130 



1140 
• * 



1150 



GTG TAA GAA AAA TAA AGT IGT CAT UCA ViX: c;aT AAA AAA AAA AAA A 
CAC ATT CTT TTT ATT TCA ACA GTA AGT AAG CTA TTT TTT TTT TIT T 
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hTAFIISOa peptides: ^ A-C- ^ 

1. DVQLHLERQ_NMJ[PGFGSEEI_pyK 

2. KKLQDLVREVDPNEQLDEDV_EMLLQIADD 

3. LQDLVREVDPN 

hTAFII30p peptides: ^<,(^<^ ID AJO "2^-) 

1. VFVGEWEEALDVEEKP 

2. HMREAVRRLK 

3. MQILVSSFEEEQLNLYEMYN_KLAyC3Q 
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hTAF I 48 -> Genes 

DNA sequence 1578 b.p. ATTCCAAGC7AA ... GTCTGTTTTCTT linear 

Read from Bionet/Intelligenetics file "48 prof 

1 ATTCCAAGCTAAATTTAGGCGGGT ATG AGT GAT TTC AGT GAA GAA TTA AAA GGG OCT G7G ACA GAT 66 
1 MSDFSEELKG PVTDU 

67 GAT GAA GAA GTG GAA ACA TCT GTG CTC AGT GGT GCA GGA ATG CAT TTT CCT TGG CTT CAA 126 
15DEEVETSVLSGAGMHFPWL0 34 



127 ACA TAG GTA GAA ACT GTG GCC ATT 
35 T Y V E T V A . I 


GGA 
G 


GGG 
G 


AAA 
K 


AGG .AGG 
R R 


AAG 
K 


GAT 
D 


TTT 
F 


GCT 
A 


CAG 
C 


ACA 
T 


ACA 
T 


186 
54 


187 AGT GCT TGT TTA AGT TTT ATC CAA 
55SACLSFIQ 


GAA 
E 


GCT 
A 


CTG 
L 


CTC 
L 


AAG 
K 


CAC 
H 


CAA 
0 


TGG 
W 


CAG 
Q 


CAA 
0 


GCT 
A 


GCA 
A 


246 
74 


247 GAA TAG ATC TAG AGT TAT TTT CAG 
75EyMYSyF0 


ACC 
T 


TTG 
L 


GAA 
E 


GAT 
D 


TCA 
S 


CAT 
D 


AGC 

S 


TAC 
Y 


AAA 
K 


AGG 
R 


CAG 
0 


GCT 
A 


306 
94 


307 GCA CCT GAG ATT ATT TGG AAG CTC 
95 A P E I I W K L 


GGA 
G 


AGT 
S 


GAA 
E 


nil 
I 


L 


TTT 
F 


TAT 

Y 


CAT 

H 


CCC 
P 


AAA 
K 


AGC 
S 


AAC 
N 


366 
114 


367 ATG GAG AGT TTC AAT ACT TTT GCT 
IISMESFNTFA 


AAC 
N 


CGG 

R . 


ATG 
M 


&&& 

nnn 

K 


nn X 

N 


ATT 
1 


GGC 
G 


GTC 
V 


ATG 
M 


AAT 

K 


TAT 
Y 


TTA 
L 


<26 
134 


427 AAG ATC TCC TTA CAA CAT GCA TTA 
135 K I S L Q H A L 


TAC 
Y 


CTT 
L 


CTG 
L 


CAT 
H 


CAT 
H 


GGA 
G 


ATG 
H 


CTT 
L 


AAA 
K 


GAT 
D 


GCT 
A 


AAG 
K 


486 
154 


4 87 AGA AAT CTG AGT GAG GCA GAG ACA 
155 R N L S E A E T 


TGG 
W 


AGA 
R 


CAT 
H 


GUT 

G 


E 


AAT 
N 


ACG 
T 


TCT 
S 


TCC 
5 


CGG 
R 


GAA 
E 


ATA 
I 


54 6 
174 


547 TTA ATC A AC CTT ATT CAG GCC TAT 
175 L I N L I Q A Y 


AAA 
K 


GGG 
G 


CTT 
L 


L 


Q 


TAT 
Y 


TAT 

Y 


ACC 
T 


TGG 
W 


TCT 
S 


GAA 
E 


AAG 
K 


606 
194 


607 AAG ATG GAA TTC TCA AAG CTT CAT 
195KMELSKL0 


AAG 
K 


GAT 
0 


GAT 
D 


y 


cr*T 

1 

A 


TAC 
Y 


AAT 
N 


GCA 
A 


CTA 
V 


GCC 
A 


CAG 
Q 


GAT 
D 


666 
214 


667 GTG TTC A AC CAC AGC TGG AAG ACA 
215VrNHSHKT 


TCT 

S 


GCA 
A 


AAT 
N 


ATT 
I 


TCT 
S 


GCA 
A 


TTG 
L 


ATT 
Z 


AAA 
K 


ATT 
I 


CCT 
P 


GGA 
G 


726 
234 


727 GTT TGG GAC CCT TTT GTG AAG AGT 
235 V HDP F V K S 


TAT 
Y 


GTA 
V 


GAA 
£ 


ATG 
M 


CTG 
L 


GAA 
E 


TTC 
F 


TAT 
Y 


GGG 
G 


GAT 
D 


CCA 
R 


GAT 
D 


786 
254 


787 GGA GCC CAA GAG GTA CTC ACC AAT 
2S5GAQEVLTN 


TAT 
Y 


GCA 
A 


TAT 
Y 


GAT 
D 


GAA 
E 


AAG 
K 


TTT 
F 


CCA 
P 


TCA 
S 


AAT 

N 


CCA 
P 


AAT 
N 


846 
274 


847 GCC CAT ATC TAG TTA TAC AAC TTT 
275 A H I Y L Y N F 


CTA 
L 


AAC 
X 


AGA 
R 


CAG 
Q 


AAG 
K 


GCA 
A 


CCA 
P 


AGA 
R 


TCA 
5 


AAA 
K 


TTG 

L • 


ATA 
I 


906 
294 


907 ACT GTG CTT AAG ATT TTC TAT CAG 
295 S V L K I L Y 0 


ATT 
I 


GTA 
V 


CCA 
P 


TCT 

s 


H 


AAA 
K 


TTG 
L 


ATG 
M 


TTG 
L 


GAA 
E 


TTC 
F 


CAT 
H 


966 
314 


967 ACA TTA CTT AGA AAA TCA GAA AAA 
315TLLRKSEK 


GAA 
E 


GAA 
E 


CAC 
H 


CCT 
R 


AAA 
K 


CTG 
L 


GGG 
C 


TTG 
L 


GAG 

E. 


CTA 

V 


TTA 
L 


TTT 
F 


1026 
334 


1027 GGA GTC TTA CAT TTT GCC GGA TGC 
335 GVLDFAGC 


ACT 
T. 


AAG 
K 


AAT 
N 


ATA 
I 


ACT 
T 


GCT 
A 


TGG 
H 


AAA 
K 


TAC 
Y 


TTG 
L 


GCA 
A 


AAA 
K 


1086 
354 


1087 TAT CTG AAA AAT ATC TTA ATG GGA 
355 YLKNILMG 


AAC 
N 


CAC 
H 


CTT 
L 


GCC 
A 


TGG 
W 


CTT 
V 


CAA 
Q 


CAA 
Z 


GAG 
E 


TGG 
W 


AAC 
N 


TCC 
S 


1146 
374 


1147 AGG AAA AAC TGG TGG CCA GGG TTT 
375 RKNWWPGF 


CAT 
H 


TTC 
F 


AGC 
S 


TAC 
Y 


TTT 
F 


TGG 
W 


GCA 
A 


AAA 
K 


AGT 
S 


GAT 
D 


TGG 
W 


AAG 
K 


1206 
394 


1207 GAA GAT ACA GCT TTC GCC TGT GAG 
395 E D T A L A C E 


AAA 
K 


GCT 
A 


TTT 

F . 


GTG 
V 


GCT 
A 


GGT 
G 


TTA 
L 


CTG 
L 


TTA 

L 


GGA 


AAA 
K 


GGT 
G 


1266 
414 


1267 TGT AGA TAT TTC CGG TAT ATT TTA 
415 C R Y F R Y I L 


AAG 
K 


CAA 
Q 


GAT 
0 


CAC 
H 


CAA 
Q 


ATC 
I 


TTA 
L 


GGG 
G 


AAC 
K 


AAA 
K 


ATT 
1 


AAC 
K 


1326 
434 


1327 CGG ATG AAG AGA TCT GTG AAA AAA 

435 R X K R S V K K 


TAC 

y 


AGT 

S 


ATT 
I 


GTA 

V 


AAT 

N 


CCA 
P 


AGA 
R 


CTC 

L 


TCA 


TACTGAATTTTA 


1389 

4-1 
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hTAFjllO cDKA and deduced amino acid sequence 

Dr. R. Tjian laboratory. Department of Molecular and Cell Biology, 
University of California, Berkeley, 

10 20 30 40 50 

* * * # * 

GCTCGAGTGCCAAAGCTGGGGTTCTACTTGAGATTTCCCTCGTGGTGCCA 

60 70 BO 90 100 

* * * * * 

GGGTCCGGCGAGCATCACGCCGAGGCCCATTTTCCAGACGACCACGACGA 

110 120 130 . 140 150 

* * ■ * * * 

GGCCGGGGTCACGAACTCTGGCGCCCCTTACCAGCTTCCAGTCTCTCGAG 

160 170 180 190 200 

* . * * * * 

GTGGCCAGTGTGGTGCTTGGTCCTTGTTTCCAGGATGGACTTCCCCAGCT 

M D F P S> 

210 220 230 240 250 

* * . * * * 

CCCTCCGCCCTGCGTTGTTTCTGACCGGCCCCCTTGGTCTGAGCGACGTC 
SLRPALFLTGPLGLSDV> 

260 270 280 290 300 

* * * « * 

CCTGACCTCTCTTTCATGTGCAGCTGGCGAGACGCACTGACTCTGCCAGA 
P DLSFMCSWRDA liTLP£> 

310 320 330 340 350 

<* * ★ * * . 

GGCCCAGCCCCAGAACTCAGAGAATGGGGCACTGCATGTGACCAAGGACC 
AQP QNSENGALHVTKD> 

360 370 380 390 400 

***** 

TGCTGTGGGAGCCGGCAACCCCTGGGCCTCTCCCCATGCTGCCTCCCCTC 
LLW BPAT PGPLPMLPPL> 

410 420 430 440 450 

***** 

ATCGATCCCTGGGACCCTGGCCTGACTGCCCGGGACCTGCTTTTCCGCGG 
I D P WD P G LT A RDL L F RG> 

460 470 480 490 500 

* * « * * 

AGGGTACCGGTATCGGAAGCGGCCCCGAGTCGTGCTGGATGTGACTGAGC 
GYRYR KRPRVVLDVTE> 
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510 520 530 540 550 

* * . It * * 

AGATCAGCCGGTTCCTCTTGGATCATGGAGACGTAGCCTTTGCGCCCCTG 
QISRFLLDHGDVAFAPL> 

% 560 570 580 590 600 

* * « « * 

GGGAAGCTGATGCTGGAGAATTTCAAGCTGGAGGGAGCGGGGAGCCGCAC. 
GKLM L ENFKLEGAGS RT> 

610 620 630 640 650 

* * * * « 

TAAGAAGAAGACAGTGGTCA6TGTGAAGAAGCTGCTCCAGGACCTCGGTG 
KK KTVVSVKK LLQDLO 

660 670 6B0 690 700 

* * * * « 

GACACCAGCCCTGGGGGTGTCCCTGGGCTTACCTCAGCAACCGACAGCGC 
GHQPWGCPWAY LSNRQR> 

710 720 730 740 750 

* * * * * 

CGCTTCTCTATCCTCGGGGGCCCCATCCTGGGCACGTCGGTGGCGAGCCA 
RFSILGGPILGTSVASH> 

760 770 780 790 800 

* * * * ft 

CTTGGCAGAGCTGCTGCACGAGGAGCTGGTGCTGCGGTGGGAGCAGCTGC 
LAELLHE£LVLRW £QL> 

810 820 830 840 850 

***** 

TTCTGGATGAGGCCTGCACTGGGGGCGCGCTGGCCTGG6TTCCTGGAAGG 
L LD E A C T G G A L AW V P G R> 

860 870 680 890 900 

***** 

ACACCCCAGTTCGGGCAGCTGGTCTACCCTGCTGGAGGCGCCCAGGACAG 
TPQFGQL VYPAGGA QDR> 

910 920 930 940 950 

* ' * * * *. 

GCTGCATTTCCAAGAGGTCGTTCTGACCCCAGGTGACAATCCCCAATTCC 
L HFQ EVVLTPGDNPQF> 

960 970 980 990 1000 

***** 

TTGGGAAACCTGGACGCATCCAGCTCCAGGGACCTGTCCGGCAAGTGGTG 
LGKPGRIQLQGPVRQVV> 

1010 1020 1030 1040 1050 

***** 

ACATGCACCGTCCAGGGAGAAAGTAAGGCCCTTATATACACTTTCCTCCC 
TCTVQGESKALIYTFLP> 
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1060 1070. 1080 1090 1100 

* * * * • 

TCACTGGCTGACCTGCTACCTGACCCCTGGCCCTTTCCATCCCTCCTCAG 
HWLTCyLTPGPFHPSS> 

1110 1120 1130 1140 1150 

***** 

CTCTGCTGGCCGTCCGCTCTGACTACCACTGTGCCGTGTGGAAGTTTGGT 
ALLAVRSDYHCAVWK FG> 

1160 1170 1180 1190 1200 

* » * * * 

AAACAGTGGCAGCCAACCCTTCTGCAGGCGATGCAGGTGGAGAAAGGGGC 
KQWQ P TLLQAMQV EKGA> 

1210 1220 1230 1240 1250 

***** 

CACGGGGATCAGCCTCAGCCCTCACCTGCCCGGGGAGCTGGCCATCTGCA 
TGISLS PHLPGELAIO 

1260 1270 1280 1290 1300 

***** 

GCCGCTCGGGAGCCGTCTGCCTGTGGAGCCCTGAGGATGGGCTGCGGCAA 
SRSGAV CLWSPEDGLRQ> 

1310 1320 1330 1340 1350 

* • * * * 

ATCTACAGGGACCCTGAGACCCTCGTGTTCCGGGACTCCTCTTCGTGGCG 
IY RDPETLVFRDSSSWR> 

1360 1370 1360 1390 1400 

****.* 

TTGGGCAGACTTCACTGCGCACCCTCGGGTGCTGACCGTGGGTGACCGCA 
WADFTAHPRV LTVGDR> 

1410 1420 1430 1440 1450 

* * * * * 

CCGGAGTGAAGATGCTGGACACTCAGGGCCCGCCGGGCTGTGGTCTGTTG 
T GVKMLDTQGPPGCGLL> 

1460 1470 1480 1490 1500 

***** 

CTTTTTCGTTTGGGGGCAGAGGCTTCGTGCCAGAAAGGGGAACGTGTCCT 
LFRLGAEASCQKGE RVL> 

1510 1520 1530 1540 1550 

***** 

GCTTACCCAGTACCTGGGGCACTCCAGCCCCAAATGCCTCCCCCCTACTC 
LTQYLGHSSPKCLPPT> 

* 

1560 1570 1580 1590 1600 

* * . * * * 

TTCATCTCGTCTGTACCCAGTTCTCTCTCTACCTAGTGGACGAGCGCCTT i 
L HLVCTQFSLYLVDERL> 
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1610 1620 . 1630 1640 1650 

***** 

CCCCTGGTGCCGATGCTGAAGTGGAACCATGGCCTCCCCTCCCCGCTCCT 
PLVPMLKWNHGLPSPLL> 

1660 1670 1680 1690 1700 

***** 

GCTGGCCCGACTGCTGCCTCCGCCCCGGCCCAGCTGCGTGCAGCCCCTGC 
LARLLPPPRPSCVQPI,> 

1710 1720 1730 1740 1750 

***** 

TCCTCGGAGGCCAGGGTGGGCAGCTGCAGCTGC7GCACCTGGCAGGAGAA 
L LG G QG GO LQ L LH LA G E> 

1760 1770 1780 1790 1800 

***** 

GGGGCGTCGGTGCCCCGCCTGGCAGGCCCCCCCCAGTCTCTTCCTTCCAG 
GASVPRLAGPPQS LPS R> 

1810 1820 1830 1840 1850 

***** 

GATCGACTCCCTCCCTGCATTTCCTCTGCTGGAGCCTAAGATCCAGTGGC 
X 0 S L P A T P L L £ P K I Q W> 

1660 1870 1880 1890 1900 

* . * * * * 

GGCT6CAGGAGCGCCTGAAAGCACCGACCATAGGTCTGGCTGCCGTCGTC 
RLQERLKA.PTZG LAAVV> 

1910 1920 1930 1940 1950 

* ' * * *' * 

CCGCCCTTGCCCTCAGCGCCCACACCAGGCCTGGTGCTCTTCCAGCTCTC 
PPLPSAPTPGLVLFQLS> 

1960 1970 1980 1990 2000 

***** 

GGCGGCGGGAGATGTCTTCTACCAGCAGCTCCGCCCCCAGGTGGACTCCA 
AAGDVFYQQLRPQVDS> 

2010 2020 2030 2040 2050 

* * * * ' -* 

GCCTCCGCAGAGATGCTGGGCCTCCTGGCGACACCCAACCTGACTGCCAT 
SLRRDAGPPGDTQPDCH> 

2060 2070 2080 2090 2100 

***** 

GCCCCCACAGCTTCCTGGACCTCCCAGGACACTGCCGGCTGCAGCCAGTG 
APTASWTSQ DTAGCSQW> 

2110 2120 2130 2140 2150 

* * * * * 

GCTGAAGGCCCTGCTAAAAGTGCCCCTGGCTCCTCCTGTCTGGACAGCAC 
LKALLKVPLAP PVWTA> 
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2160 2170 2180 2190 2200 

***** 

ccaccttcacccaccgccagAtgctgggcagcacagagctgcggagggag 
ptfthr.qmlgstelrre> 

2210 2220 2230 2240 2250. 

* * * • * * 

GAAGAGGAAGGGCAGCGGCTGGGTG7GC7CCGCAAGGCCATGGCCCGAGG 
EEEGQRLGVLR KAMARG> 

2260 2270 2260 2290 2300 

***** 

GCAGCTCCTGCTGCAGAGAGACCTGGGCTCCCTCCCTGCGGCAGAGCCAC 
QLLLQRDLGSLPAAEP> 

2310 2320 2330 2340 2350 

* ' * . * * * 

CCCCTGCACCCGAGTCAGGCCTAGAGGACAAGCTCAGTGAGCGCCTGGGG 
PPAPESGLEDKLSERLG> 

2360 2370 2380 2390 2400 

***** 

GAAGCCTGGGCAGGCCGAGGGGCTGCCTGG7GGGAGAGGCAGCAGGGCA6 
EAWAGRG AAWW£RQQGR> 

2410 2420 2430 2440 2450 

***** 

GACCTCGGAGCCCGGGAGACAGACCAGGCGGCCCAAGCGCCGGACCCAGC 
TSEPGRQTRRPKRRTQ> 

2460 2470 2480 2490 2500 

* « * * « 

TGTCCAGCAGCTTTTCGCTCAGTGGCCATGTGGATCCGTCAGAGGACACC 
LSSSFSL5GHVDPS£ DT> 

2510 2520 2530 2540 2550 

***** 

AGCTCCCCTCATAGCCCTGAGTGGCCACCTGCTGATGCTCTGCCCCTGCC 
SSPHSP£WPPADALPLP> 

2560 2570 2580 2590 2600 

* * * * * 

CCCCACGACCCCGCCCTCCCAGGAGTTGAC7CCGGATGCATGCGCCCAGG 
PTTPPSQEI.TPDACAQ> 

2610 2620 2630 2640 2650 

***** 

GCGTCCCATCAGAGCAGCGGCAGATGCTCCGTGACTACATGGCCAAGCTA 
GVP SEQRQMI-R DyMAKL> 

2660 2670 2680 2690 2700 

* * * * * 

CCACCCCAGAGGGACACCCCAGGCTGTGCCACCACACCTCCCCACTCCCA 
PPQRDTPGCATTPP HSQ> 



17 0 



nMsnrtnin* 0di7Dfl7Ai I > 



wo 94/17087 



PCT/US94/01114 



2710 2720 . 2730 2740 2750 

***** 

GGCCTCCAGCGTCCGGGCCACTCGCTCCCAGCAGCACACACCCGTCCTCT 
AS S VRAT RSQ QHTPV L> 

2760 2770 2780 2790 2800 

* - * * . * * * 

CTAGCTCTCAGCCCCTCCGGAAGAAGCCTCGAATGGGCTTCTGAGGACAC 
SSSQPLRKKPR .MGF> 

2810 2820 2830 2840 2850 

* * ♦ * * 

AAGGTGGGCTGCCCTCAAGCCCCAGAGAGCCCCTCATCCTTCCTCTGGGA 

2660 2870 2680 2890 2900 

***** 

CCAGATGTGCCTTCCACAGTTGAAACTTGAGAAGCAGAGCTCGCCACCTT 

2910 2920 2930 2940 2950 

* * . * * * 

CTGGAGGCCACTGTGATGATGAGCCAAGCAAnTGGAGCCAAGTTGAAGG 

2960 2970 2960 2990 3000 

* * * ' * * 

GACAGGGCAACAAAATACAGTAGTAGTTTCTTTTGTATTTTGTATATTCG 

3010 3020 3030 3040 3050 

* * * * * 

CC7GAAGATCATCCCGCAAGGCAGGCT6GAGGTGCCGGTGGGCCTGTG7T 

3060 3070 3060 3090 3100 

* * * * • 

GCTGGGATTTTAGTCTGTGCTGGGAGGCAGGGCTCCGTGCGCCTCAGCTG 

3110 3120 3130 3140 3150 

* * _ * * * 



3160 3170 3180 3190 3200 

* * * « « 

A7GGGGGCCAGGAGTGCTGGCTCCTCGTGTTT6GTGAGGGTGGAGTGAGG 

3210 3220 3230 3240 3250 

* * * * * 

CCCCTGCAGAGCTGCTGATGAGGTGGGCACAGCGGCCGTTGGCAGCTGCT 

3260 3270 3280 3290 3300 

* * * . * * 

GTTGTGGGTTGCTTTGTCAATCTCTGCCCCGGTCTGATGTTTCCTACAGG 

3310 3320 3330 3340 3350 

* * * * * 

GAGATGCCGTGGATCCAGGTTCAGGGACTAAATACACTTGGCAGCTGAAG 
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3360 3370 3380 3390 3400 

* * * * * 

ATGAATTGGAATGGTCACGTTTTTTAGGCTGGNACAGCGTCCCGCCACAG 

3410 3420 3430 3440 3450 

***** 

CTACTACCTGACACTGAGCTCATGCAGAGAGATGATGGCTGATGTTCCTT 

3460 3470 3480 3490 3500 

***** 

CTCCCTTGGGACATGGGTCTGGCACCTGTGGGCTGTCGATAGTGCCCTCT 

3510 3520 3530 3540 3550 

* * * * * 

GAGCAGAGGGTCACGGTCATGTCAGTTTGGGGGAATTCTCTGTTGTGCCT 

3560 3570 3580 3590 3600 

***** 

CAGAGACTCCCCCCTTTCTTTCCTCCCTCCCCTTCTCATTTTGATGTCTA 

3610 3620 3630 3640 3650 

* * ' . * ' * * 

AAGCATCAAGTCCCTCTTCCTCAGAGTTTCTCTAGCTGCAGTGGAAGATT 

3660 3670 3680 3690 3700 

* ' * * * * 

CTGTTTTCCTGTGGGGAAAATGCTCACTTGAGATTTTGCAGGGACCCGGG 

3710 3720 3730 3740 3750 

***** 

TCTGTCTGGTTTCTGATGACATAGTAAGAGAAAGGTCTTTTTTCA6GTTG 

3760 3770 3780 3790 3800 

* * * * * 

GCTGGTGAAAGGAATTGCATGTGACTCACACAAACAGGAGCTAGCCCAAT 

3810 3820 3830 3840 3850 

* * * * * 

CATACACTGACTCGCGTGGGTGTTTAAATGTTTATCATGCCTAAGGGAGA 

3860 3870 3880 3890 3900 

***** 

CATTTATAATTAAACCATTTATGCTACATAAAAAAAAAAAAAAAAAAAAA 
AA 
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3360 3370 3380 3390 3400 

* • , * * * 

ATGAATTGGAATGGTCACGTTTTTTAGGCTGGNACAGCGTCCCGCCACAG 

3410 3420 3430 3440 3450 

***** 

CTACTACCTGACACTGAGCTCATGCAGAGAGATGATGGCTGATGTTCCTT 

3460 3470 3480 3490 3500 

***** 

CTCCCTTGGGACAT6GGTCTGGCACCTGTGGGCTGTCGATAGTGCCCTCT 

3510 3520 3530 3540 3550 

* * * * * 

GAGCAGAGGGTCACGGTCAT6TCAGTTTGGGGGAATTCTCTGTTGTGCCT 

3560 3570 3580 3590 3600 

***** 

3610 3620 3630 3640 3650 

* * * * * * 

AAGCATCAAGTCCCTCTTCCTCAGAGTTTCTCTAGCTGCAGTGGAAGATT 

3660 3670 3680 3690 3700 

. * . * * * • 

CTGTTTTCCTGTGGGGAAAATGCTCACTTGAGATTTTGCAGGGACCCGGG 

3710 3720 3730 3740 3750 

* * * * * 

TCTGTCTGGTTTCTGATGACATAGTAAGAGAAAGGTCTTTTTTCAGGTTG 

3760 3770 3780 3790 3800 

***** 

GCTGGTGAAAGGAA7TGCATGT6ACTCACACAAACAGGAGCTAGCCCAAT 

3810 3820 3830 3840 3850 

* • * * * * 

CATACACTGACTCGCGTGGGTGTTTAAATGTTTATCATGCCTAAGGGAGA 

3860 3870 3880 3890 3900 

* * * * * 

CATTTATAATTAAACCATTTATGCTACATAAAAAAAAAAAAAAAAAAAAA 
AA 
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WHAT IS CI.AIMI'.n IS : 

1. A composiiinn cuinprisiiiu ii suhstaniially pure, biologically active portion 
of a TAF. wherein said TAI- is other than CCGi. 

5 2. A composition accordinj; lo claim I wherein said portion is a substantially 
full-length TAF. 

3. A composition according to claim I wherein said TAF is selected from the 
group consisting of dTAFZ.SO. dTAFII150. dTAFIIllO, dTAFII80, dTAFHeO, 

10 dTAHMO and dTAFIISO. 

4. A composition according to claim I wherein said TAF is selected from the 
group consisting of hTAFII25(). hTAFlI150, liTAFIIlSO, hTAFIIlOO, hTAFIHO, 
hTAFIMOand hTAF!l3(). 

15 

5. A composition according to claim 1 wherein said TAF is selected from the 
group consisting of hTAFI 1 10. hTAFI63 anfJ hTAFI48. 

6. A composition according to claim I wherein said TAF is selected from the 
20 group consisting of hTAFIII!72. and hTAFIII25. 

7. A composition coniprising an isolated nucleic acid sequence encoding a 
portion of a TAF according to claim i. 

25 8. A composition according to claim 7 wherein said portion is a substantially 
full-length TAF. 

9. A composition according to claim 7 wherein said TAF is selected from the 
group consisting of dTAF2.'S0. dTArill50, dTAFIIllO. dTAFIISO, dTAFII60, 
30 dTAFIMOand dTAKII30. 
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10. A coniposiiinn iiccurding lo claim 7 wlierein said TAF is selected from the 
group consisiint! of hTAPII250. h TAI-IIl.'iO. hTAFH130, hTAFIIlOO, hTAFIHO, 
hTAFIMOand hTAFII30. 

5 11. A composition according lo claim 7 wherein said TAF is selected from the 
group consisting of liTAFIllO, hTAFI63 and hTAFI48. 

12. A composition according to claim 7 wherein said TAF is selected from the 
group consisting of iiTAFIlI172, and hTAFIII25. 

10 

13! An antibody that specifically binds a composition according to Claim 1. 

14. A vector comprisinii a nucleic acid sequence according to claim 7 operably 
linked to a transcription regulatory clement. 

15 

15. A cell comprising a nucleic acid sequence according to claim 7. 

16. A process for the production of a TAF comprising culturing the cell of 
Claim 15 under conditions suitable for the expression of said TAF and recovering 

20 said TAF. 

17. A composition comprising a recombinantly produced TAF. 

18. A method of identifying an agent useful in the diagnosis or treatnrient of 
25 disease associated wiih iranscripiion. said method comprising the steps of: 

contaciing an aj!eni with at least a portion of a TAF according to claim 1; 

and, 

determining whether s:\id agent specifically binds said TAF. 

30 19. A method of identifying an agent useful in the diagnosis or treatment of 
disease associated wiih transcription, said method comprising the steps of: 

adding an aaeni to a mixture comprising at least a portion of a TAF 
according to claim I: 
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comparing tlic association of mixiiire components before and after said 
adding step: 

identifying an agent thai aliers the association of mixture components. 

5 20. A method for ircaiing disease, said method comprising: 

identifying an agent according to the method of claim 18; and, 

contacting an indivichial with said agent; 

wherein said agent modulates transcription in said individual. 

10 2L A method for treating disease, said method comprising: 

identifying an agent according to the method of claim 19; and, 

contacting an individual with said agent; 

wherein said agent modulates transcription in said individual. 
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